Display entries of compact dataframe column by column¶

Author: Lucie Luecke, 2024

This notebook goes through the columns of a compact dataframe (original databases or output database of databases) and displays the (meta)data.

Use this to familiarise yourself with the contents of a compact dataframe.

A compact dataframe has standardised columns and data formats for:

archiveType
dataSetName
datasetId
geo_meanElev
geo_meanLat
geo_meanLon
geo_siteName
interpretation_direction (new in v2.0)
interpretation_variable
interpretation_variableDetail
interpretation_seasonality (new in v2.0)
originalDataURL
originalDatabase
paleoData_notes
paleoData_proxy
paleoData_sensorSpecies
paleoData_units
paleoData_values
paleoData_variableName
year
yearUnits
(optional: DuplicateDetails)

Set up working environment¶

Make sure the repo_root is added correctly, it should be: your_root_dir/dod2k This should be the working directory throughout this notebook (and all other notebooks).

In [1]:

Copied!





%load_ext autoreload
%autoreload 2

import sys
import os
from pathlib import Path

# Add parent directory to path (works from any notebook in notebooks/)
# the repo_root should be the parent directory of the notebooks folder
current_dir = Path().resolve()
# Determine repo root
if current_dir.name == 'dod2k': repo_root = current_dir
elif current_dir.parent.name == 'dod2k': repo_root = current_dir.parent
else: raise Exception('Please review the repo root structure (see first cell).')

# Update cwd and path only if needed
if os.getcwd() != str(repo_root):
    os.chdir(repo_root)
if str(repo_root) not in sys.path:
    sys.path.insert(0, str(repo_root))

print(f"Repo root: {repo_root}")
if str(os.getcwd())==str(repo_root):
    print(f"Working directory matches repo root. ")
%load_ext autoreload
%autoreload 2

import sys
import os
from pathlib import Path

# Add parent directory to path (works from any notebook in notebooks/)
# the repo_root should be the parent directory of the notebooks folder
current_dir = Path().resolve()
# Determine repo root
if current_dir.name == 'dod2k': repo_root = current_dir
elif current_dir.parent.name == 'dod2k': repo_root = current_dir.parent
else: raise Exception('Please review the repo root structure (see first cell).')

# Update cwd and path only if needed
if os.getcwd() != str(repo_root):
    os.chdir(repo_root)
if str(repo_root) not in sys.path:
    sys.path.insert(0, str(repo_root))

print(f"Repo root: {repo_root}")
if str(os.getcwd())==str(repo_root):
    print(f"Working directory matches repo root. ")

Repo root: /home/jupyter-lluecke/dod2k
Working directory matches repo root.

In [2]:

Copied!

import pandas as pd
import numpy as np

from dod2k_utilities import ut_functions as utf # contains utility functions
import pandas as pd
import numpy as np

from dod2k_utilities import ut_functions as utf # contains utility functions

Read dataframe¶

Read compact dataframe.

{db_name} refers to the database, including e.g.

database of databases:
- dod2k_v2.0 (dod2k: duplicate free, merged database)
- dod2k_v2.0_filtered_M (filtered for M sensitive proxies only)
- dod2k_v2.0_filtered_M_TM (filtered for M and TM sensitive proxies only)
- dod2k_v2.0_filtered_speleo (filtered for speleothem proxies only)
- all_merged (NOT filtered for duplicates, only fusion of the input databases)
original databases:
- fe23
- ch2k
- sisal
- pages2k
- iso2k

All compact dataframes are saved in {repo_root}/data/{db_name} as {db_name}_compact.csv.

In [3]:

Copied!





# read dataframe, choose from the list below, or specify your own

db_name = 'dod2k_v2.0'

# load dataframe
df = utf.load_compact_dataframe_from_csv(db_name)
print(df.info())
df.name = db_name
# read dataframe, choose from the list below, or specify your own

db_name = 'dod2k_v2.0'

# load dataframe
df = utf.load_compact_dataframe_from_csv(db_name)
print(df.info())
df.name = db_name

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4781 entries, 0 to 4780
Data columns (total 22 columns):
 #   Column                         Non-Null Count  Dtype  
---  ------                         --------------  -----  
 0   archiveType                    4781 non-null   object 
 1   dataSetName                    4781 non-null   object 
 2   datasetId                      4781 non-null   object 
 3   duplicateDetails               4781 non-null   object 
 4   geo_meanElev                   4699 non-null   float32
 5   geo_meanLat                    4781 non-null   float32
 6   geo_meanLon                    4781 non-null   float32
 7   geo_siteName                   4781 non-null   object 
 8   interpretation_direction       4781 non-null   object 
 9   interpretation_seasonality     4781 non-null   object 
 10  interpretation_variable        4781 non-null   object 
 11  interpretation_variableDetail  4781 non-null   object 
 12  originalDataURL                4781 non-null   object 
 13  originalDatabase               4781 non-null   object 
 14  paleoData_notes                4781 non-null   object 
 15  paleoData_proxy                4781 non-null   object 
 16  paleoData_sensorSpecies        4781 non-null   object 
 17  paleoData_units                4781 non-null   object 
 18  paleoData_values               4781 non-null   object 
 19  paleoData_variableName         4781 non-null   object 
 20  year                           4781 non-null   object 
 21  yearUnits                      4781 non-null   object 
dtypes: float32(3), object(19)
memory usage: 765.8+ KB
None

Display dataframe¶

Display identification metadata: dataSetName, datasetId, originalDataURL, originalDatabase¶

index¶

In [4]:

Copied!

# # check index
print(df.index)
# # check index
print(df.index)

RangeIndex(start=0, stop=4781, step=1)

dataSetName (associated with each record, may not be unique)¶

In [5]:

Copied!





# # check dataSetName
key = 'dataSetName'
print('%s: '%key)
print(df[key].values)
print(np.unique([str(type(dd)) for dd in df[key]]))
print(f'No. of unique values: {len(np.unique(df[key]))}/{len(df)}')
# # check dataSetName
key = 'dataSetName'
print('%s: '%key)
print(df[key].values)
print(np.unique([str(type(dd)) for dd in df[key]]))
print(f'No. of unique values: {len(np.unique(df[key]))}/{len(df)}')

dataSetName: 
['NAm-MtLemon.Briffa.2002' 'NAm-MtLemon.Briffa.2002'
 'NAm-MtLemon.Briffa.2002' ... 'Sahiya cave' 'Sahiya cave'
 'europe_swed019w, europe_swed021w']
["<class 'str'>"]
No. of unique values: 3843/4781

datasetId (unique identifier, as given by original authors, includes original database token)¶

In [6]:

Copied!





# # check datasetId

print(len(df.datasetId.unique()))
print(len(df))
key = 'datasetId'
print('%s (starts with): '%key)
print(df[key].values)
print(np.unique([str(type(dd)) for dd in df[key]]))
print('datasetId starts with: ', np.unique([str(dd.split('_')[0]) for dd in df[key]]))
print(f'No. of unique values: {len(np.unique(df[key]))}/{len(df)}')
# # check datasetId

print(len(df.datasetId.unique()))
print(len(df))
key = 'datasetId'
print('%s (starts with): '%key)
print(df[key].values)
print(np.unique([str(type(dd)) for dd in df[key]]))
print('datasetId starts with: ', np.unique([str(dd.split('_')[0]) for dd in df[key]]))
print(f'No. of unique values: {len(np.unique(df[key]))}/{len(df)}')

4781
4781
datasetId (starts with): 
['pages2k_5' 'pages2k_8' 'pages2k_18' ... 'sisal_901.0_544'
 'sisal_901.0_545'
 'dod2k_composite_z_FE23_europe_swed019w_FE23_europe_swed021w']
["<class 'str'>"]
datasetId starts with:  ['FE23' 'ch2k' 'dod2k' 'iso2k' 'pages2k' 'sisal']
No. of unique values: 4781/4781

originalDataURL (URL/DOI of original published record where available)¶

In [7]:

Copied!





# originalDataURL
key = 'originalDataURL'
print('%s: '%key)
print(np.unique([kk for kk in df[key]]))
print(np.unique([kk for kk in df[key] if 'this' in kk]))
print(np.unique([str(type(dd)) for dd in df[key]]))
# 'this study' should point to the correct URL (PAGES2k)
print(f'No. of unique values: {len(np.unique(df[key]))}/{len(df)}')
# originalDataURL
key = 'originalDataURL'
print('%s: '%key)
print(np.unique([kk for kk in df[key]]))
print(np.unique([kk for kk in df[key] if 'this' in kk]))
print(np.unique([str(type(dd)) for dd in df[key]]))
# 'this study' should point to the correct URL (PAGES2k)
print(f'No. of unique values: {len(np.unique(df[key]))}/{len(df)}')

originalDataURL: 
['FE23_europe_swed019w: https://www.ncei.noaa.gov/pub/data/paleo/treering/measurements/europe/swed019w-noaa.rwl, FE23_europe_swed021w: https://www.ncei.noaa.gov/pub/data/paleo/treering/measurements/europe/swed021w-noaa.rwl'
 'This compilation' "['10.1002/2015GL063826']" ... 'this compilation'
 'www.ncdc.noaa.gov/paleo-search/study/27330'
 'www.ncdc.noaa.gov/paleo/study/2474']
['this compilation']
["<class 'str'>"]
No. of unique values: 3776/4781

originalDatabase (original database used as input for dataframe)¶

In [8]:

Copied!





# # originalDataSet
key = 'originalDatabase'
print('%s: '%key)
print(np.unique([kk for kk in df[key]]))
print(np.unique([str(type(dd)) for dd in df[key]]))
# Note: the last two records have missing URLs
print(f'No. of unique values: {len(np.unique(df[key]))}/{len(df)}')
# # originalDataSet
key = 'originalDatabase'
print('%s: '%key)
print(np.unique([kk for kk in df[key]]))
print(np.unique([str(type(dd)) for dd in df[key]]))
# Note: the last two records have missing URLs
print(f'No. of unique values: {len(np.unique(df[key]))}/{len(df)}')

originalDatabase: 
['CoralHydro2k v1.0.1' 'FE23 (Breitenmoser et al. (2014))' 'Iso2k v1.1.2'
 'PAGES 2k v2.2.0' 'SISAL v3' 'dod2k_composite_z']
["<class 'str'>"]
No. of unique values: 6/4781

geographical metadata: elevation, latitude, longitude, site name¶

geo_meanElev (mean elevation in m)¶

In [9]:

Copied!





# check Elevation
key = 'geo_meanElev'
print('%s: '%key)
print(df[key])
print(np.unique(['%d'%kk for kk in df[key] if np.isfinite(kk)]))
print(np.unique([str(type(dd)) for dd in df[key]]))
print(f'No. of unique values: {len(np.unique(df[key]))}/{len(df)}')
# check Elevation
key = 'geo_meanElev'
print('%s: '%key)
print(df[key])
print(np.unique(['%d'%kk for kk in df[key] if np.isfinite(kk)]))
print(np.unique([str(type(dd)) for dd in df[key]]))
print(f'No. of unique values: {len(np.unique(df[key]))}/{len(df)}')

geo_meanElev: 
0       2700.0
1       2700.0
2       2700.0
3       2700.0
4       2700.0
         ...  
4776    1190.0
4777    1190.0
4778    1190.0
4779    1190.0
4780     400.0
Name: geo_meanElev, Length: 4781, dtype: float32
['-1' '-10' '-1011' ... '991' '994' '995']
["<class 'float'>"]
No. of unique values: 1091/4781

geo_meanLat (mean latitude in degrees N)¶

In [10]:

Copied!





# # Latitude
key = 'geo_meanLat'
print('%s: '%key)
print(np.unique(['%d'%kk for kk in df[key]]))
print(np.unique([str(type(dd)) for dd in df[key]]))
print(f'No. of unique values: {len(np.unique(df[key]))}/{len(df)}')
# # Latitude
key = 'geo_meanLat'
print('%s: '%key)
print(np.unique(['%d'%kk for kk in df[key]]))
print(np.unique([str(type(dd)) for dd in df[key]]))
print(f'No. of unique values: {len(np.unique(df[key]))}/{len(df)}')

geo_meanLat: 
['-1' '-10' '-11' '-12' '-13' '-14' '-15' '-16' '-17' '-18' '-19' '-20'
 '-21' '-22' '-23' '-24' '-25' '-26' '-27' '-28' '-29' '-3' '-31' '-32'
 '-33' '-34' '-35' '-36' '-37' '-38' '-39' '-4' '-40' '-41' '-42' '-43'
 '-44' '-45' '-46' '-47' '-5' '-50' '-51' '-53' '-54' '-6' '-64' '-66'
 '-69' '-7' '-70' '-71' '-72' '-73' '-74' '-75' '-76' '-77' '-78' '-79'
 '-8' '-82' '-84' '-89' '-9' '0' '1' '10' '11' '12' '13' '15' '16' '17'
 '18' '19' '2' '20' '21' '22' '23' '24' '25' '26' '27' '28' '29' '3' '30'
 '31' '32' '33' '34' '35' '36' '37' '38' '39' '4' '40' '41' '42' '43' '44'
 '45' '46' '47' '48' '49' '5' '50' '51' '52' '53' '54' '55' '56' '57' '58'
 '59' '6' '60' '61' '62' '63' '64' '65' '66' '67' '68' '69' '7' '70' '71'
 '72' '73' '75' '76' '77' '78' '79' '8' '80' '81' '82' '9']
["<class 'float'>"]
No. of unique values: 2168/4781

geo_meanLon (mean longitude)¶

In [11]:

Copied!





# # Longitude 
key = 'geo_meanLon'
print('%s: '%key)
print(np.unique(['%d'%kk for kk in df[key]]))
print(np.unique([str(type(dd)) for dd in df[key]]))
print(f'No. of unique values: {len(np.unique(df[key]))}/{len(df)}')
# # Longitude 
key = 'geo_meanLon'
print('%s: '%key)
print(np.unique(['%d'%kk for kk in df[key]]))
print(np.unique([str(type(dd)) for dd in df[key]]))
print(f'No. of unique values: {len(np.unique(df[key]))}/{len(df)}')

geo_meanLon: 
['-1' '-10' '-100' '-101' '-102' '-103' '-104' '-105' '-106' '-107' '-108'
 '-109' '-110' '-111' '-112' '-113' '-114' '-115' '-116' '-117' '-118'
 '-119' '-12' '-120' '-121' '-122' '-123' '-124' '-125' '-126' '-127'
 '-128' '-129' '-13' '-130' '-131' '-132' '-133' '-134' '-135' '-136'
 '-137' '-138' '-139' '-140' '-141' '-142' '-143' '-144' '-145' '-146'
 '-147' '-148' '-149' '-150' '-151' '-152' '-153' '-154' '-157' '-159'
 '-16' '-160' '-161' '-162' '-163' '-169' '-17' '-174' '-18' '-19' '-2'
 '-22' '-24' '-26' '-27' '-3' '-33' '-35' '-36' '-37' '-38' '-39' '-4'
 '-41' '-42' '-43' '-44' '-45' '-46' '-47' '-49' '-5' '-50' '-51' '-54'
 '-55' '-56' '-57' '-58' '-6' '-60' '-61' '-62' '-63' '-64' '-65' '-66'
 '-67' '-68' '-69' '-7' '-70' '-71' '-72' '-73' '-74' '-75' '-76' '-77'
 '-78' '-79' '-8' '-80' '-81' '-82' '-83' '-84' '-85' '-86' '-87' '-88'
 '-89' '-9' '-90' '-91' '-92' '-93' '-94' '-95' '-96' '-97' '-98' '-99'
 '0' '1' '10' '100' '101' '102' '103' '104' '105' '106' '107' '108' '109'
 '11' '110' '111' '112' '113' '114' '115' '116' '117' '118' '119' '12'
 '120' '121' '122' '123' '124' '125' '126' '127' '128' '129' '13' '130'
 '132' '133' '134' '136' '137' '138' '14' '141' '142' '143' '144' '145'
 '146' '147' '148' '149' '15' '150' '151' '152' '153' '154' '155' '158'
 '159' '16' '160' '162' '163' '165' '166' '167' '168' '169' '17' '170'
 '171' '172' '173' '174' '175' '176' '177' '179' '18' '19' '2' '20' '21'
 '22' '23' '24' '25' '26' '27' '28' '29' '3' '30' '31' '32' '33' '34' '35'
 '36' '37' '38' '39' '4' '40' '41' '42' '43' '44' '45' '46' '49' '5' '50'
 '51' '53' '54' '55' '56' '57' '58' '59' '6' '60' '63' '64' '65' '68' '69'
 '7' '70' '71' '72' '74' '75' '76' '77' '78' '79' '8' '80' '81' '82' '83'
 '84' '85' '86' '87' '88' '89' '9' '90' '91' '92' '93' '94' '95' '96' '97'
 '98' '99']
["<class 'float'>"]
No. of unique values: 2640/4781

geo_siteName (name of collection site)¶

In [12]:

Copied!





# Site Name 
key = 'geo_siteName'
print('%s: '%key)
print(df[key].values)
print(np.unique([str(type(dd)) for dd in df[key]]))
print(f'No. of unique values: {len(np.unique(df[key]))}/{len(df)}')
# Site Name 
key = 'geo_siteName'
print('%s: '%key)
print(df[key].values)
print(np.unique([str(type(dd)) for dd in df[key]]))
print(f'No. of unique values: {len(np.unique(df[key]))}/{len(df)}')

geo_siteName: 
['Mt. Lemon' 'Mt. Lemon' 'Mt. Lemon' ... 'Sahiya cave' 'Sahiya cave'
 'COMPOSITE: Torneträskr+f.,Bartoli + Torneträskfos.,Bartoli']
["<class 'str'>"]
No. of unique values: 3527/4781

proxy metadata: archive type, proxy type, interpretation¶

archiveType (archive type)¶

In [13]:

Copied!





# archiveType
key = 'archiveType'
print('%s: '%key)
print(np.unique(df[key]))
print(np.unique([str(type(dd)) for dd in df[key]]))
print(f'No. of unique values: {len(np.unique(df[key]))}/{len(df)}')
# archiveType
key = 'archiveType'
print('%s: '%key)
print(np.unique(df[key]))
print(np.unique([str(type(dd)) for dd in df[key]]))
print(f'No. of unique values: {len(np.unique(df[key]))}/{len(df)}')

archiveType: 
['Borehole' 'Coral' 'Documents' 'GlacierIce' 'GroundIce' 'LakeSediment'
 'MarineSediment' 'MolluskShell' 'Other' 'Sclerosponge' 'Speleothem'
 'Wood']
["<class 'str'>"]
No. of unique values: 12/4781

In [14]:

Copied!

df[(df['archiveType']=='MarineSediment')&(df['paleoData_proxy']=='d18O')][['paleoData_proxy', 'interpretation_variable', 'originalDatabase', 'originalDataURL']]
df[(df['archiveType']=='MarineSediment')&(df['paleoData_proxy']=='d18O')][['paleoData_proxy', 'interpretation_variable', 'originalDatabase', 'originalDataURL']]

Out[14]:

	paleoData_proxy	interpretation_variable	originalDatabase	originalDataURL
317	d18O	temperature	PAGES 2k v2.2.0	https://www1.ncdc.noaa.gov/pub/data/paleo/page...
871	d18O	temperature	PAGES 2k v2.2.0	https://www1.ncdc.noaa.gov/pub/data/paleo/page...
3844	d18O	temperature	Iso2k v1.1.2	https://doi.pangaea.de/10.1594/PANGAEA.857573
3874	d18O	temperature	Iso2k v1.1.2	https://www.ncdc.noaa.gov/paleo-search/study/2519
3875	d18O	temperature	Iso2k v1.1.2	https://www.ncdc.noaa.gov/paleo-search/study/2519
3879	d18O	temperature	Iso2k v1.1.2	https://www.ncdc.noaa.gov/paleo/study/18315
3881	d18O	moisture	Iso2k v1.1.2	https://www.ncdc.noaa.gov/paleo/study/8725
3882	d18O	temperature	Iso2k v1.1.2	https://www.ncdc.noaa.gov/paleo/study/8725
3979	d18O	moisture	Iso2k v1.1.2	https://www.ncdc.noaa.gov/paleo/study/5968
3980	d18O	temperature	Iso2k v1.1.2	https://www.ncdc.noaa.gov/paleo/study/5968
3988	d18O	temperature	Iso2k v1.1.2	https://www.ncdc.noaa.gov/paleo/study/22592
4002	d18O	moisture	Iso2k v1.1.2	https://www.ncdc.noaa.gov/paleo/study/16155
4003	d18O	temperature	Iso2k v1.1.2	https://www.ncdc.noaa.gov/paleo/study/16155
4085	d18O	N/A	Iso2k v1.1.2	http://doi.pangaea.de/10.1594/PANGAEA.780423
4118	d18O	temperature	Iso2k v1.1.2	https://www.ncdc.noaa.gov/paleo-search/study/5966
4120	d18O	temperature	Iso2k v1.1.2	http://doi.pangaea.de/10.1594/PANGAEA.776444
4122	d18O	temperature	Iso2k v1.1.2	https://www.ncdc.noaa.gov/paleo/study/6242
4127	d18O	moisture	Iso2k v1.1.2	https://www.ncdc.noaa.gov/paleo/study/5968
4128	d18O	temperature	Iso2k v1.1.2	https://www.ncdc.noaa.gov/paleo/study/5968
4133	d18O	N/A	Iso2k v1.1.2	http://doi.pangaea.de/10.1594/PANGAEA.735717
4134	d18O	moisture	Iso2k v1.1.2	https://www.ncdc.noaa.gov/paleo/study/5968
4135	d18O	temperature	Iso2k v1.1.2	https://www.ncdc.noaa.gov/paleo/study/5968
4204	d18O	temperature	Iso2k v1.1.2	https://doi.pangaea.de/10.1594/PANGAEA.760166

In [15]:

Copied!

df[df['archiveType']=='MarineSediment'][['paleoData_proxy', 'interpretation_variable']]
df[df['archiveType']=='MarineSediment'][['paleoData_proxy', 'interpretation_variable']]

Out[15]:

	paleoData_proxy	interpretation_variable
47	Mg/Ca	temperature
48	temperature	temperature
63	alkenone	temperature
74	alkenone	temperature
75	Uk37	N/A
...	...	...
4133	d18O	N/A
4134	d18O	moisture
4135	d18O	temperature
4147	dD	N/A
4204	d18O	temperature

125 rows × 2 columns

paleoData_proxy (proxy type)¶

In [16]:

Copied!





# paleoData_proxy
key = 'paleoData_proxy'
print('%s: '%key)
print(np.unique([kk for kk in df[key]]))
print(np.unique([str(type(dd)) for dd in df[key]]))
print(f'No. of unique values: {len(np.unique(df[key]))}/{len(df)}')
# paleoData_proxy
key = 'paleoData_proxy'
print('%s: '%key)
print(np.unique([kk for kk in df[key]]))
print(np.unique([str(type(dd)) for dd in df[key]]))
print(f'No. of unique values: {len(np.unique(df[key]))}/{len(df)}')

paleoData_proxy: 
['ARSTAN' 'Mg/Ca' 'Sr/Ca' 'TEX86' 'Uk37' 'accumulation rate' 'alkenone'
 'borehole' 'calcification rate' 'chironomid' 'chloride'
 'chrysophyte assemblage' 'concentration' 'count' 'd13C' 'd18O' 'dD'
 'diatom' 'dinocyst' 'dust' 'effective precipitation' 'foraminifera'
 'growth rate' 'historical' 'humidification index' 'ice melt'
 'maximum latewood density' 'multiproxy' 'nitrate' 'pollen' 'reflectance'
 'ring width' 'sodium' 'sulfate' 'temperature' 'thickness'
 'varve thickness']
["<class 'str'>"]
No. of unique values: 37/4781

paleoData_sensorSpecies (further information on proxy type: species)¶

In [17]:

Copied!





# climate_interpretation
key = 'paleoData_sensorSpecies'
print('%s: '%key)
print(np.unique([kk for kk in df[key]]))
print(np.unique([str(type(dd)) for dd in df[key]]))
print(f'No. of unique values: {len(np.unique(df[key]))}/{len(df)}')
# climate_interpretation
key = 'paleoData_sensorSpecies'
print('%s: '%key)
print(np.unique([kk for kk in df[key]]))
print(np.unique([str(type(dd)) for dd in df[key]]))
print(f'No. of unique values: {len(np.unique(df[key]))}/{len(df)}')

paleoData_sensorSpecies: 
['ABAL' 'ABAM' 'ABBA' 'ABBO' 'ABCE' 'ABCI' 'ABCO' 'ABLA' 'ABMA' 'ABPI'
 'ABPN' 'ABPR' 'ABSB' 'ABSP' 'ACRU' 'ACSH' 'ADHO' 'ADUS' 'AGAU' 'ARAR'
 'ATCU' 'ATSE' 'AUCH' 'BEPU' 'CABU' 'CADE' 'CADN' 'CARO' 'CDAT' 'CDBR'
 'CDDE' 'CDLI' 'CEAN' 'CESP' 'CHLA' 'CHNO' 'Ceratoporella nicholsoni'
 'DABI' 'DACO' 'Diploria labyrinthiformis' 'Diploria strigosa' 'FAGR'
 'FASY' 'FICU' 'FRNI' 'HABI' 'Hydnophora microconos, Porites lobata'
 'JGAU' 'JUEX' 'JUFO' 'JUOC' 'JUPH' 'JUPR' 'JURE' 'JUSC' 'JUSP' 'JUVI'
 'LADE' 'LAGM' 'LALA' 'LALY' 'LAOC' 'LASI' 'LGFR' 'LIBI' 'LITU'
 'Montastraea faveolata' 'N/A' 'NA' 'NOBE' 'NOGU' 'NOME' 'NOPU' 'NOSO'
 'NaN' 'Orbicella faveolata' 'P. australiensis, possibly P. lobata' 'PCAB'
 'PCEN' 'PCGL' 'PCGN' 'PCMA' 'PCOB' 'PCOM' 'PCPU' 'PCRU' 'PCSH' 'PCSI'
 'PCSM' 'PCSP' 'PHAL' 'PHAS' 'PHGL' 'PHTR' 'PIAL' 'PIAM' 'PIAR' 'PIBA'
 'PIBN' 'PIBR' 'PICE' 'PICL' 'PICO' 'PIEC' 'PIED' 'PIFL' 'PIHA' 'PIHR'
 'PIJE' 'PIKO' 'PILA' 'PILE' 'PILO' 'PIMO' 'PIMU' 'PIMZ' 'PINI' 'PIPA'
 'PIPE' 'PIPI' 'PIPN' 'PIPO' 'PIPU' 'PIRE' 'PIRI' 'PIRO' 'PISF' 'PISI'
 'PISP' 'PIST' 'PISY' 'PITA' 'PITO' 'PIUN' 'PIVI' 'PIWA' 'PLRA' 'PLUV'
 'PPDE' 'PPSP' 'PRMA' 'PSMA' 'PSME' 'PTAN' 'Pavona clavus' 'Porites'
 'Porites austraiensis' 'Porites lobata' 'Porites lutea' 'Porites solida'
 'Porites sp.' 'Pseudodiploria strigosa' 'QUAL' 'QUDG' 'QUFR' 'QUHA'
 'QUKE' 'QULO' 'QULY' 'QUMA' 'QUMC' 'QUPE' 'QUPR' 'QURO' 'QURU' 'QUSP'
 'QUST' 'QUVE' 'Siderastrea radians' 'Siderastrea siderea'
 'Siderastrea stellata' 'Solenastrea bournoni' 'TABA' 'TADI' 'TAMU' 'TEGR'
 'THOC' 'THPL' 'TSCA' 'TSCR' 'TSDU' 'TSHE' 'TSME' 'ULSP' 'VIKE' 'WICE'
 'bournoni' 'labyrinthiformis' 'lamellina' 'lobata' 'lutea' 'nan'
 'siderea']
["<class 'str'>"]
No. of unique values: 193/4781

paleoData_notes (notes)¶

In [18]:

Copied!





# # paleoData_notes
key = 'paleoData_notes'
print('%s: '%key)
print(df[key].values)
print(np.unique([str(type(dd)) for dd in df[key]]))
print(f'No. of unique values: {len(np.unique(df[key]))}/{len(df)}')
# # paleoData_notes
key = 'paleoData_notes'
print('%s: '%key)
print(df[key].values)
print(np.unique([str(type(dd)) for dd in df[key]]))
print(f'No. of unique values: {len(np.unique(df[key]))}/{len(df)}')

paleoData_notes: 
['nan' 'nan' 'nan' ... 'calcite' 'calcite'
 'FE23_europe_swed019w: Investigator: Schweingruber, FE23_europe_swed021w: Investigator: Schweingruber']
["<class 'str'>"]
No. of unique values: 426/4781

paleoData_variableName¶

In [19]:

Copied!





# paleoData_variableName
key = 'paleoData_variableName'
print('%s: '%key)
print(np.unique([kk for kk in df[key]]))
print(np.unique([str(type(dd)) for dd in df[key]]))
# paleoData_variableName
key = 'paleoData_variableName'
print('%s: '%key)
print(np.unique([kk for kk in df[key]]))
print(np.unique([str(type(dd)) for dd in df[key]]))

paleoData_variableName: 
['ARSTAN' 'MAR' 'Mg/Ca' 'R650/R700' 'RABD660670' 'Sr/Ca' 'TEX86' 'Uk37'
 'calcification rate' 'chloride' 'composite' 'concentration' 'count'
 'd13C' 'd18O' 'd2H' 'dD' 'dust' 'effective precipitation' 'growth rate'
 'humidification index' 'ice melt' 'maximum latewood density' 'nan'
 'nitrate' 'precipitation' 'reflectance' 'ring width' 'sodium' 'sulfate'
 'temperature' 'thickness' 'year']
["<class 'str'>"]

climate metadata: interpretation variable, direction, seasonality¶

interpretation_direction¶

In [20]:

Copied!





# climate_interpretation
key = 'interpretation_direction'
print('%s: '%key)
print(np.unique([kk for kk in df[key]]))
print(f'No. of unique values: {len(np.unique(df[key]))}/{len(df)}')
# climate_interpretation
key = 'interpretation_direction'
print('%s: '%key)
print(np.unique([kk for kk in df[key]]))
print(f'No. of unique values: {len(np.unique(df[key]))}/{len(df)}')

interpretation_direction: 
['Increase' 'N/A' 'NaN' 'None' 'T_air (positive), P_amount (negative)'
 'T_air (positive), P_amount (negative), SPEI (negative)' 'decrease'
 'decrease/increase'
 'depends (orbital timescale: More Indian Monsoon moisture-->more enriched. Since 3ka: Indian source has been stable, so amount effect dominates: more rainfall, more intense hydrological cycle -->More depleted)'
 'increase' 'negaitive' 'negative' 'positive'
 'positive for d18O-temperature relation, negative for d13C-precipiation amount']
No. of unique values: 14/4781

interpretation_seasonality¶

In [21]:

Copied!





# climate_interpretation
key = 'interpretation_seasonality'
print('%s: '%key)
print(np.unique([kk for kk in df[key]]))
print(f'No. of unique values: {len(np.unique(df[key]))}/{len(df)}')
# climate_interpretation
key = 'interpretation_seasonality'
print('%s: '%key)
print(np.unique([kk for kk in df[key]]))
print(f'No. of unique values: {len(np.unique(df[key]))}/{len(df)}')

interpretation_seasonality: 
['Annual' 'Apr' 'Apr-Jul' 'Apr-Jun' 'Apr-Sep' 'Aug' 'Aug-Jul' 'Dec-Feb'
 'Dec-Mar' 'Dec-May' 'Feb' 'Feb-Aug' 'Growing Season' 'Jan' 'Jan-Apr'
 'Jan-Jun' 'Jan-Mar' 'Jul' 'Jul-Dec' 'Jul-Sep' 'Jun' 'Jun-Aug' 'Jun-Jul'
 'Jun-Sep' 'Mar' 'Mar-Aug' 'Mar-May' 'Mar-Nov' 'Mar-Oct' 'May' 'May-Apr'
 'May-Dec' 'May-Jul' 'May-Oct' 'May-Sep' 'N/A' 'None' 'Nov-Apr' 'Nov-Feb'
 'Nov-Jan' 'Oct-Apr' 'Oct-Dec' 'Oct-Sep' 'Sep-Apr' 'Sep-Aug' 'Sep-Nov'
 'Sep-Oct' 'Spr-Sum' 'Summer' 'Wet Season' 'Winter' 'deleteMe' 'subannual']
No. of unique values: 53/4781

interpretation_variable¶

In [22]:

Copied!





# climate_interpretation
key = 'interpretation_variable'
print('%s: '%key)
print(np.unique([kk for kk in df[key]]))
print(f'No. of unique values: {len(np.unique(df[key]))}/{len(df)}')
# climate_interpretation
key = 'interpretation_variable'
print('%s: '%key)
print(np.unique([kk for kk in df[key]]))
print(f'No. of unique values: {len(np.unique(df[key]))}/{len(df)}')

interpretation_variable: 
['N/A' 'NOT temperature NOT moisture' 'moisture' 'temperature'
 'temperature+moisture']
No. of unique values: 5/4781

interpretation_variableDetail¶

In [23]:

Copied!





# climate_interpretation
key = 'interpretation_variableDetail'
print('%s: '%key)
print(np.unique([kk for kk in df[key]]))
print(f'No. of unique values: {len(np.unique(df[key]))}/{len(df)}')
# climate_interpretation
key = 'interpretation_variableDetail'
print('%s: '%key)
print(np.unique([kk for kk in df[key]]))
print(f'No. of unique values: {len(np.unique(df[key]))}/{len(df)}')

interpretation_variableDetail: 
['0.58 +- 0.11ppt/degrees C'
 'FE23_europe_swed019w: N/A, FE23_europe_swed021w: N/A'
 'Maximum air temperature, seasonal' 'Maximum temperature' 'N/A' 'NaN'
 'None'
 'Original interpretation_variable: circulationIndex, interpretation_variableDetail: lake water'
 'Original interpretation_variable: circulationVariable, interpretation_variableDetail: Indian monsoon'
 'Original interpretation_variable: circulationVariable, interpretation_variableDetail: More negative d18O values correspond to stronger amount'
 'Original interpretation_variable: circulationVariable, interpretation_variableDetail: N/A'
 'Original interpretation_variable: circulationVariable, interpretation_variableDetail: tropical or North Pacific moisture'
 'Original interpretation_variable: deleteMe, interpretation_variableDetail: N/A'
 'Original interpretation_variable: deleteMe, interpretation_variableDetail: more positive values of d13C indicate a spread of C4 prairy grasses and decline of C3 forest plants, more positive d18O indicates evaporation of soil water which is stronger in the prairy environment than in the forsest'
 'Original interpretation_variable: effectivePrecipitation, interpretation_variableDetail: E:P lake water'
 'Original interpretation_variable: effectivePrecipitation, interpretation_variableDetail: N/A'
 'Original interpretation_variable: effectivePrecipitation, interpretation_variableDetail: Seasonal'
 'Original interpretation_variable: effectivePrecipitation, interpretation_variableDetail: air@surface'
 'Original interpretation_variable: effectivePrecipitation, interpretation_variableDetail: eff'
 'Original interpretation_variable: effectivePrecipitation, interpretation_variableDetail: lake level'
 'Original interpretation_variable: effectivePrecipitation, interpretation_variableDetail: lake water'
 'Original interpretation_variable: effectivePrecipitation, interpretation_variableDetail: lake, winds in eastern Patagonia'
 'Original interpretation_variable: effectivePrecipitation, interpretation_variableDetail: soil moisture'
 'Original interpretation_variable: evaporation, interpretation_variableDetail: Aleutian Low/westerly storm trajectories'
 'Original interpretation_variable: evaporation, interpretation_variableDetail: Indian Monsoon Strength'
 'Original interpretation_variable: evaporation, interpretation_variableDetail: N/A'
 'Original interpretation_variable: hydrologicBalance, interpretation_variableDetail: groundwater'
 'Original interpretation_variable: hydrologicBalance, interpretation_variableDetail: lake water'
 'Original interpretation_variable: precipitation, interpretation_variableDetail: Amount of rainfall change'
 'Original interpretation_variable: precipitation, interpretation_variableDetail: Australian-Indonesian Summer monsoon; More negative d18O values correspond to stronger amount'
 'Original interpretation_variable: precipitation, interpretation_variableDetail: Australian-Indonesian monsoon rainfall'
 'Original interpretation_variable: precipitation, interpretation_variableDetail: Continental Sweden'
 'Original interpretation_variable: precipitation, interpretation_variableDetail: ENSO/PDO'
 'Original interpretation_variable: precipitation, interpretation_variableDetail: East Asian Monsoon Strength; more negative values of d18O are interpreted as indicative of increased monsoon strength'
 'Original interpretation_variable: precipitation, interpretation_variableDetail: Indian Summer Monsoon; more negative values of d18O are interpreted as indicative of increased monsoon strength'
 'Original interpretation_variable: precipitation, interpretation_variableDetail: Lower precipitation produces higher d13C and Sr/Ca values'
 'Original interpretation_variable: precipitation, interpretation_variableDetail: Monsoon strength'
 'Original interpretation_variable: precipitation, interpretation_variableDetail: More negative d18O values correspond to stronger amount'
 'Original interpretation_variable: precipitation, interpretation_variableDetail: N/A'
 'Original interpretation_variable: precipitation, interpretation_variableDetail: Precipitation'
 'Original interpretation_variable: precipitation, interpretation_variableDetail: SAM'
 'Original interpretation_variable: precipitation, interpretation_variableDetail: Seasonal'
 'Original interpretation_variable: precipitation, interpretation_variableDetail: Seasonal, annual'
 'Original interpretation_variable: precipitation, interpretation_variableDetail: South China Sea'
 'Original interpretation_variable: precipitation, interpretation_variableDetail: Southern Tibet'
 'Original interpretation_variable: precipitation, interpretation_variableDetail: The interpretation is made for an older section of the sample. Last 2k data was not the focus of the manuscript'
 'Original interpretation_variable: precipitation, interpretation_variableDetail: Variations in NAO (related to the amount of rainfall. Season not specified)'
 'Original interpretation_variable: precipitation, interpretation_variableDetail: air@surface'
 'Original interpretation_variable: precipitation, interpretation_variableDetail: amount of rainfall'
 'Original interpretation_variable: precipitation, interpretation_variableDetail: d18O changes of speleothems reflect effects of temperature on raifnall d18O, rainfall amounts affect cave hydrology and biomass density above the cave, which is recorded in d13C of speleothems'
 'Original interpretation_variable: precipitation, interpretation_variableDetail: higher values are related to less rainfall - this can be realted to less moisture influex from the Caribbean due to a southward shift of the ITCZ in phases when high amounts of meltwater enter the cooling north Atlantic Ocean; after ~4.3 ka the connection to the north Atalatic is lost and ENSO becomes more important with warm ENSO events (El Nino) causing higher d18O'
 'Original interpretation_variable: precipitation, interpretation_variableDetail: in the southern tropical Andes'
 'Original interpretation_variable: precipitation, interpretation_variableDetail: more negative values of d18O are interpreted as indicative of increased summer monsoon precipitation'
 'Original interpretation_variable: precipitation, interpretation_variableDetail: more positive d18O values are interpreted to represent wetter conditions'
 'Original interpretation_variable: precipitation, interpretation_variableDetail: precipitation'
 'Original interpretation_variable: precipitation, interpretation_variableDetail: relative portion of summer (SAM) vs winter rainfall'
 'Original interpretation_variable: precipitation, interpretation_variableDetail: surface'
 'Original interpretation_variable: precipitation, interpretation_variableDetail: variations in paleoprecipitation amount on a multi-annual timescale (On longer timescales, however, the flowstone?s growth dynamics have to be considered)'
 'Original interpretation_variable: precipitationIsotope, interpretation_variableDetail: Competing influence of polar and maritime airmasses'
 'Original interpretation_variable: precipitationIsotope, interpretation_variableDetail: East Asian Monsoon rainfall'
 'Original interpretation_variable: precipitationIsotope, interpretation_variableDetail: minimum temperature'
 'Original interpretation_variable: precipitationIsotope, interpretation_variableDetail: moisture'
 'Original interpretation_variable: precipitationIsotope, interpretation_variableDetail: of precipitation'
 'Original interpretation_variable: precipitationIsotope, interpretation_variableDetail: precipitation'
 'Original interpretation_variable: precipitationIsotope, interpretation_variableDetail: precipitation amount'
 'Original interpretation_variable: precipitationIsotope, interpretation_variableDetail: rain'
 'Original interpretation_variable: precipitationIsotope, interpretation_variableDetail: relative humidity'
 'Original interpretation_variable: precipitationIsotope, interpretation_variableDetail: summer monsoon'
 'Original interpretation_variable: salinity, interpretation_variableDetail: N/A'
 'Original interpretation_variable: salinity, interpretation_variableDetail: sea surface'
 'Original interpretation_variable: salinity, interpretation_variableDetail: surface'
 'Original interpretation_variable: seaIce, interpretation_variableDetail: N/A'
 'Original interpretation_variable: seasonality, interpretation_variableDetail: N/A'
 'Original interpretation_variable: seasonality, interpretation_variableDetail: changes of d18O in speleothems reflect changes of the average d18O of rainfall in the region related to rainfall seasonality'
 'Original interpretation_variable: seasonality, interpretation_variableDetail: relative amount of winter snowfall'
 'Original interpretation_variable: streamflow, interpretation_variableDetail: N/A'
 'Original interpretation_variable: streamflow, interpretation_variableDetail: lake water'
 'Seasonal, annual' 'air' 'air-surface' 'air@600m' 'air@condensationLevel'
 'air@surface' 'ground@surface' 'ice@surface' 'lake surface' 'lake water'
 'lake@surface' 'near sea surface' 'regional and hemispheric temperature'
 'sea surface' 'sea@surface' 'sea_surface' 'sub surface (30m)'
 'sub surface (~50 m)' 'subsurface (60-80m)' 'subsurface, 136 m'
 'subsurface, 143 m' 'surface'
 'temperature - manually assigned by DoD2k authors for paleoData_proxy = Mg/Ca'
 'temperature - manually assigned by DoD2k authors for paleoData_proxy = Sr/Ca'
 'temperature+moisture - manually assigned by DoD2k authors for paleoData_proxy = d18O'
 'temperature+moisture - manually assigned by DoD2k authors for paleoData_proxy = d18O.'
 'variations in air temperature due to large-scale atmospheric patterns'
 'variations in winter temperature in the Alps']
No. of unique values: 105/4781

data¶

paleoData_values¶

In [24]:

Copied!





# # paleoData_values
key = 'paleoData_values'

print('%s: '%key)
for ii, vv in enumerate(df[key][:20]):
    try: 
        print('%-30s: %s -- %s'%(df['dataSetName'].iloc[ii][:30], str(np.nanmin(vv)), str(np.nanmax(vv))))
        print(type(vv))
    except: print(df['dataSetName'].iloc[ii], 'NaNs detected.')
print(np.unique([str(type(dd)) for dd in df[key]]))
# # paleoData_values
key = 'paleoData_values'

print('%s: '%key)
for ii, vv in enumerate(df[key][:20]):
    try: 
        print('%-30s: %s -- %s'%(df['dataSetName'].iloc[ii][:30], str(np.nanmin(vv)), str(np.nanmax(vv))))
        print(type(vv))
    except: print(df['dataSetName'].iloc[ii], 'NaNs detected.')
print(np.unique([str(type(dd)) for dd in df[key]]))

paleoData_values: 
NAm-MtLemon.Briffa.2002       : 0.154 -- 2.91
<class 'numpy.ndarray'>
NAm-MtLemon.Briffa.2002       : 0.283 -- 1.666
<class 'numpy.ndarray'>
NAm-MtLemon.Briffa.2002       : 0.574 -- 0.951
<class 'numpy.ndarray'>
NAm-MtLemon.Briffa.2002       : 0.707 -- 1.118
<class 'numpy.ndarray'>
NAm-MtLemon.Briffa.2002       : 0.757 -- 1.114
<class 'numpy.ndarray'>
Arc-Arjeplog.Bjorklund.2014   : -3.532171 -- 2.5670047
<class 'numpy.ndarray'>
Arc-Arjeplog.Bjorklund.2014   : -4.1141653 -- 2.6139
<class 'numpy.ndarray'>
Asi-CHIN019.Li.2010           : 0.298 -- 1.664
<class 'numpy.ndarray'>
NAm-Landslide.Luckman.2006    : 0.057 -- 0.76
<class 'numpy.ndarray'>
NAm-Landslide.Luckman.2006    : 0.164 -- 1.781
<class 'numpy.ndarray'>
NAm-Landslide.Luckman.2006    : 0.116 -- 1.889
<class 'numpy.ndarray'>
NAm-SmithersSkiArea.Schweingru: 0.319 -- 1.73
<class 'numpy.ndarray'>
NAm-SmithersSkiArea.Schweingru: 0.472 -- 1.576
<class 'numpy.ndarray'>
NAm-SmithersSkiArea.Schweingru: 0.44 -- 0.83
<class 'numpy.ndarray'>
NAm-SmithersSkiArea.Schweingru: 0.636 -- 1.179
<class 'numpy.ndarray'>
Asi-GANGCD.PAGES2k.2013       : 0.102 -- 2.109
<class 'numpy.ndarray'>
Ocn-Mayotte.Zinke.2008        : -5.49 -- -4.35
<class 'numpy.ndarray'>
Ocn-Mayotte.Zinke.2008        : -2.98 -- -0.29
<class 'numpy.ndarray'>
Ocn-Mayotte.Zinke.2008        : 8.537 -- 9.087
<class 'numpy.ndarray'>
Ocn-Mayotte.Zinke.2008        : -0.159 -- 1.305
<class 'numpy.ndarray'>
["<class 'numpy.ndarray'>"]

paleoData_units¶

In [25]:

Copied!





# paleoData_units
key = 'paleoData_units'
print('%s: '%key)
print(np.unique([kk for kk in df[key]]))
print(np.unique([str(type(dd)) for dd in df[key]]))
print(f'No. of unique values: {len(np.unique(df[key]))}/{len(df)}')
# paleoData_units
key = 'paleoData_units'
print('%s: '%key)
print(np.unique([kk for kk in df[key]]))
print(np.unique([str(type(dd)) for dd in df[key]]))
print(f'No. of unique values: {len(np.unique(df[key]))}/{len(df)}')

paleoData_units: 
['cm' 'cm/yr' 'count' 'count/mL' 'degC' 'g/cm/yr' 'g/cm2/yr' 'g/cm3' 'mm'
 'mm/year' 'mm/yr' 'mmol/mol' 'nan' 'needsToBeChanged' 'ng/g' 'percent'
 'permil' 'ppb' 'standardized_anomalies' 'unitless' 'yr AD' 'z score'
 'z-scores']
["<class 'str'>"]
No. of unique values: 23/4781

year¶

In [26]:

Copied!





# # year
key = 'year'
print('%s: '%key)
for ii, vv in enumerate(df[key][:20]):
    try: print('%-30s: %s -- %s'%(df['dataSetName'].iloc[ii][:30], str(np.nanmin(vv)), str(np.nanmax(vv))))
    except: print('NaNs detected.', vv)
print(np.unique([str(type(dd)) for dd in df[key]]))
# # year
key = 'year'
print('%s: '%key)
for ii, vv in enumerate(df[key][:20]):
    try: print('%-30s: %s -- %s'%(df['dataSetName'].iloc[ii][:30], str(np.nanmin(vv)), str(np.nanmax(vv))))
    except: print('NaNs detected.', vv)
print(np.unique([str(type(dd)) for dd in df[key]]))

year: 
NAm-MtLemon.Briffa.2002       : 1568.0 -- 1983.0
NAm-MtLemon.Briffa.2002       : 1568.0 -- 1983.0
NAm-MtLemon.Briffa.2002       : 1568.0 -- 1983.0
NAm-MtLemon.Briffa.2002       : 1568.0 -- 1983.0
NAm-MtLemon.Briffa.2002       : 1568.0 -- 1983.0
Arc-Arjeplog.Bjorklund.2014   : 1200.0 -- 2010.0
Arc-Arjeplog.Bjorklund.2014   : 1200.0 -- 2010.0
Asi-CHIN019.Li.2010           : 1509.0 -- 2006.0
NAm-Landslide.Luckman.2006    : 913.0 -- 2001.0
NAm-Landslide.Luckman.2006    : 913.0 -- 2001.0
NAm-Landslide.Luckman.2006    : 913.0 -- 2001.0
NAm-SmithersSkiArea.Schweingru: 1680.0 -- 1983.0
NAm-SmithersSkiArea.Schweingru: 1680.0 -- 1983.0
NAm-SmithersSkiArea.Schweingru: 1680.0 -- 1983.0
NAm-SmithersSkiArea.Schweingru: 1680.0 -- 1983.0
Asi-GANGCD.PAGES2k.2013       : 1567.0 -- 1999.0
Ocn-Mayotte.Zinke.2008        : 1865.62 -- 1993.62
Ocn-Mayotte.Zinke.2008        : 1865.62 -- 1993.62
Ocn-Mayotte.Zinke.2008        : 1881.62 -- 1994.29
Ocn-Mayotte.Zinke.2008        : 1881.62 -- 1993.62
["<class 'numpy.ndarray'>"]

yearUnits¶

In [27]:

Copied!





# yearUnits
key = 'yearUnits'
print('%s: '%key)
print(np.unique([kk for kk in df[key]]))
print(np.unique([str(type(dd)) for dd in df[key]]))
print(f'No. of unique values: {len(np.unique(df[key]))}/{len(df)}')
# yearUnits
key = 'yearUnits'
print('%s: '%key)
print(np.unique([kk for kk in df[key]]))
print(np.unique([str(type(dd)) for dd in df[key]]))
print(f'No. of unique values: {len(np.unique(df[key]))}/{len(df)}')

yearUnits: 
['CE']
["<class 'str'>"]
No. of unique values: 1/4781

In [ ]: