Load CoralHydro 2k¶

load data from Coral Hydro 2k v1.0.1 (https://essd.copernicus.org/articles/15/2081/2023/)

Dataset downloaded from LiPDverse: https://lipdverse.org/CoralHydro2k/current_version/

Created by Kevin Fan and Lucie Luecke (LL). Base on Feng Zhu and Julien Emile-Geay's template (lipd2df notebook)

Update 24/10/25 by LL: streamline and tidy up for publication and documentation Update 21/11/24 by LL : added option to save as csv Update 30/10/24 by LL (v4): added check for empty paleoData_values row Update 29/10/2024 by LL (v4): modified datasetId to create unique identifier for each record.

Here we extract a dataframe with the following columns:

archiveType
dataSetName
datasetId
geo_meanElev
geo_meanLat
geo_meanLon
geo_siteName
interpretation_direction (new in v2.0)
interpretation_variable
interpretation_variableDetail
interpretation_seasonality (new in v2.0)
originalDataURL
originalDatabase
paleoData_notes
paleoData_proxy
paleoData_sensorSpecies
paleoData_units
paleoData_values
paleoData_variableName
year
yearUnits

We save a standardised compact dataframe for concatenation to DoD2k

Set up working environment¶

Make sure the repo_root is added correctly, it should be: your_root_dir/dod2k This should be the working directory throughout this notebook (and all other notebooks).

In [1]:

Copied!





%load_ext autoreload
%autoreload 2

import sys
import os
from pathlib import Path

# Add parent directory to path (works from any notebook in notebooks/)
# the repo_root should be the parent directory of the notebooks folder
init_dir = Path().resolve()
# Determine repo root
if init_dir.name == 'dod2k': repo_root = init_dir
elif init_dir.parent.name == 'dod2k': repo_root = init_dir.parent
else: raise Exception('Please review the repo root structure (see first cell).')

# Update cwd and path only if needed
if os.getcwd() != str(repo_root):
    os.chdir(repo_root)
if str(repo_root) not in sys.path:
    sys.path.insert(0, str(repo_root))

print(f"Repo root: {repo_root}")
if str(os.getcwd())==str(repo_root):
    print(f"Working directory matches repo root. ")
%load_ext autoreload
%autoreload 2

import sys
import os
from pathlib import Path

# Add parent directory to path (works from any notebook in notebooks/)
# the repo_root should be the parent directory of the notebooks folder
init_dir = Path().resolve()
# Determine repo root
if init_dir.name == 'dod2k': repo_root = init_dir
elif init_dir.parent.name == 'dod2k': repo_root = init_dir.parent
else: raise Exception('Please review the repo root structure (see first cell).')

# Update cwd and path only if needed
if os.getcwd() != str(repo_root):
    os.chdir(repo_root)
if str(repo_root) not in sys.path:
    sys.path.insert(0, str(repo_root))

print(f"Repo root: {repo_root}")
if str(os.getcwd())==str(repo_root):
    print(f"Working directory matches repo root. ")

Repo root: /home/jupyter-lluecke/dod2k_v2.0/dod2k
Working directory matches repo root.

In [2]:

Copied!





# Import packages
import lipd
import pandas as pd
import numpy as np

from dod2k_utilities import ut_functions as utf # contains utility functions
from dod2k_utilities import ut_plot as uplt # contains plotting functions
# Import packages
import lipd
import pandas as pd
import numpy as np

from dod2k_utilities import ut_functions as utf # contains utility functions
from dod2k_utilities import ut_plot as uplt # contains plotting functions

Load source data¶

In order to get the source data, run the cell below. This will download a series of LiPD files into the directory lipdfiles

Alternatively skip the cell and directly use the files as provided in this directory (see cell below next).

In [3]:

Copied!

# # Download the file (use -O to specify output filename)
# !wget -O data/ch2k/CoralHydro2k1_0_1.zip https://lipdverse.org/CoralHydro2k/current_version/CoralHydro2k1_0_1.zip

# # Unzip to the correct destination
# !unzip data/ch2k/CoralHydro2k1_0_1.zip -d data/ch2k/ch2k_101
# # Download the file (use -O to specify output filename)
# !wget -O data/ch2k/CoralHydro2k1_0_1.zip https://lipdverse.org/CoralHydro2k/current_version/CoralHydro2k1_0_1.zip

# # Unzip to the correct destination
# !unzip data/ch2k/CoralHydro2k1_0_1.zip -d data/ch2k/ch2k_101

In [4]:

Copied!





# load LiPD files from the given directory

D = lipd.readLipd(str(repo_root)+'/data/ch2k/ch2k_101/');
TS = lipd.extractTs(D);
len(TS)
os.chdir(repo_root)
# load LiPD files from the given directory

D = lipd.readLipd(str(repo_root)+'/data/ch2k/ch2k_101/');
TS = lipd.extractTs(D);
len(TS)
os.chdir(repo_root)

Disclaimer: LiPD files may be updated and modified to adhere to standards

Found: 179 LiPD file(s)
reading: CH03BUN01.lpd
reading: ZI15MER01.lpd
reading: CO03PAL03.lpd
reading: CO03PAL02.lpd
reading: CA13PEL01.lpd
reading: LI06RAR01.lpd
reading: CO03PAL07.lpd
reading: FL18DTO03.lpd
reading: SM06LKF02.lpd
reading: UR00MAI01.lpd
reading: TU95MAD01.lpd
reading: ZI04IFR01.lpd
reading: RE18CAY01.lpd
reading: KU99HOU01.lpd
reading: OS13NLP01.lpd
reading: EV98KIR01.lpd
reading: LI00RAR01.lpd
reading: NU11PAL01.lpd
reading: MA08DTO01.lpd
reading: AB20MEN03.lpd
reading: CA14TIM01.lpd
reading: KA17RYU01.lpd
reading: MC11KIR01.lpd
reading: AB20MEN09.lpd
reading: HE08LRA01.lpd
reading: DA06MAF01.lpd
reading: SM06LKF01.lpd
reading: NA09MAL01.lpd
reading: SW98STP01.lpd
reading: MU18GSI01.lpd
reading: ZI14HOU01.lpd
reading: FL17DTO02.lpd
reading: DA06MAF02.lpd
reading: SA19PAL02.lpd
reading: CO03PAL01.lpd
reading: ZI16ROD01.lpd
reading: OS13NGP01.lpd
reading: CH98PIR01.lpd
reading: RE19GBR02.lpd
reading: MU18RED04.lpd
reading: GR13MAD01.lpd
reading: XI17HAI01.lpd
reading: DE14DTO03.lpd
reading: KL97DAH01.lpd
reading: QU06RAB01.lpd
reading: DE14DTO01.lpd
reading: KU00NIN01.lpd
reading: TU01SIA01.lpd
reading: RE19GBR01.lpd
reading: GR13MAD02.lpd
reading: AB20MEN07.lpd
reading: BR19RED01.lpd
reading: NU09FAN01.lpd
reading: MU18RED01.lpd
reading: OS14RIP01.lpd
reading: DE14DTO02.lpd
reading: LI04FIJ01.lpd
reading: EV18ROC01.lpd
reading: CA13SAP01.lpd
reading: JI18GAL02.lpd
reading: TU01LAI01.lpd
reading: HE13MIS01.lpd
reading: GU99NAU02.lpd
reading: ZI15IMP02.lpd
reading: PF04PBA01.lpd
reading: SA20FAN02.lpd
reading: WE09ARR01.lpd
reading: CO03PAL05.lpd
reading: XU15BVI01.lpd
reading: HE18COC02.lpd
reading: MU18NPI01.lpd
reading: MO06PED01.lpd
reading: KR20SAR01.lpd
reading: SA18GBR01.lpd
reading: OS14UCP01.lpd
reading: AB20MEN08.lpd
reading: HE13MIS02.lpd
reading: HE10GUA01.lpd
reading: BO14HTI01.lpd
reading: DE12ANC01.lpd
reading: WA17BAN01.lpd
reading: DR99ABR01.lpd
reading: RO19MAR01.lpd
reading: LI06RAR02.lpd
reading: MU18RED03.lpd
reading: SW99LIG01.lpd
reading: SA16CLA01.lpd
reading: ZI15TAN01.lpd
reading: RE19GBR03.lpd
reading: DR00KSB01.lpd
reading: BO14HTI02.lpd
reading: MU17DOA01.lpd
reading: TA18TAS01.lpd
reading: XU15BVI03.lpd
reading: DU94URV02.lpd
reading: AS05GUA01.lpd
reading: FE09OGA01.lpd
reading: GU99NAU01.lpd
reading: SA20FAN01.lpd
reading: CA13DIA01.lpd
reading: AL16PUR02.lpd
reading: CO03PAL10.lpd
reading: RE19GBR05.lpd
reading: ZI15IMP01.lpd
reading: KR20SAR02.lpd
reading: RO19YUC01.lpd
reading: ST13MAL01.lpd
reading: DR00NBB01.lpd
reading: PF19LAR01.lpd
reading: AL16YUC01.lpd
reading: CO03PAL09.lpd
reading: ZI16ROD02.lpd
reading: AB20MEN05.lpd
reading: SH92PUN01.lpd
reading: KI04MCV01.lpd
reading: AL16PUR01.lpd
reading: CH18YOA02.lpd
reading: DE14DTO04.lpd
reading: AB20MEN04.lpd
reading: DE16RED01.lpd
reading: BA04FIJ02.lpd
reading: CO03PAL06.lpd
reading: JI18GAL01.lpd
reading: CH18YOA01.lpd
reading: RE19GBR04.lpd
reading: DO18DAV01.lpd
reading: GO12SBV01.lpd
reading: CA07FLI01.lpd
reading: SW99LIG02.lpd
reading: MC04PNG01.lpd
reading: CO93TAR01.lpd
reading: RO19PAR01.lpd
reading: CO00MAL01.lpd
reading: MO20WOA01.lpd
reading: AB20MEN01.lpd
reading: QU96ESV01.lpd
reading: DE13HAI01.lpd
reading: LI94SEC01.lpd
reading: ZI15CLE01.lpd
reading: MU18RED02.lpd
reading: ZI08MAY01.lpd
reading: TU01DEP01.lpd
reading: CO03PAL04.lpd
reading: RA19PAI01.lpd
reading: AB15BHB01.lpd
reading: FL18DTO01.lpd
reading: MO20KOI01.lpd
reading: DU94URV01.lpd
reading: CO03PAL08.lpd
reading: WU14CLI01.lpd
reading: ZI14TUR01.lpd
reading: AB20MEN02.lpd
reading: LI99CLI01.lpd
reading: ZI15BUN01.lpd
reading: FE18RUS01.lpd
reading: WU13TON01.lpd
reading: KI14PAR01.lpd
reading: ZI14IFR02.lpd
reading: XU15BVI02.lpd
reading: KI08PAR01.lpd
reading: AB20MEN06.lpd
reading: AB08MEN01.lpd
reading: NU09KIR01.lpd
reading: RI10PBL01.lpd
reading: CA14BUT01.lpd
reading: FL18DTO02.lpd
reading: FL18DTO04.lpd
reading: BA04FIJ01.lpd
reading: HE02GBR01.lpd
reading: GO08BER01.lpd
reading: CA13TUR01.lpd
reading: LI06FIJ01.lpd
reading: HE18COC01.lpd
reading: FL17DTO01.lpd
reading: BO99MOO01.lpd
reading: CH03LOM01.lpd
reading: SA19PAL01.lpd
reading: CH97BVB01.lpd
reading: RA20TAI01.lpd
Finished read: 179 records
extracting paleoData...
extracting: CH03BUN01
extracting: ZI15MER01
extracting: CO03PAL03
extracting: CO03PAL02
extracting: CA13PEL01
extracting: LI06RAR01
extracting: CO03PAL07
extracting: FL18DTO03
extracting: SM06LKF02
extracting: UR00MAI01
extracting: TU95MAD01
extracting: ZI04IFR01
extracting: RE18CAY01
extracting: KU99HOU01
extracting: OS13NLP01
extracting: EV98KIR01
extracting: LI00RAR01
extracting: NU11PAL01
extracting: MA08DTO01
extracting: AB20MEN03
extracting: CA14TIM01
extracting: KA17RYU01
extracting: MC11KIR01
extracting: AB20MEN09
extracting: HE08LRA01
extracting: DA06MAF01
extracting: SM06LKF01
extracting: NA09MAL01
extracting: SW98STP01
extracting: MU18GSI01
extracting: ZI14HOU01
extracting: FL17DTO02
extracting: DA06MAF02
extracting: SA19PAL02
extracting: CO03PAL01
extracting: ZI16ROD01
extracting: OS13NGP01
extracting: CH98PIR01
extracting: RE19GBR02
extracting: MU18RED04
extracting: GR13MAD01
extracting: XI17HAI01
extracting: DE14DTO03
extracting: KL97DAH01
extracting: QU06RAB01
extracting: DE14DTO01
extracting: KU00NIN01
extracting: TU01SIA01
extracting: RE19GBR01
extracting: GR13MAD02
extracting: AB20MEN07
extracting: BR19RED01
extracting: NU09FAN01
extracting: MU18RED01
extracting: OS14RIP01
extracting: DE14DTO02
extracting: LI04FIJ01
extracting: EV18ROC01
extracting: CA13SAP01
extracting: JI18GAL02
extracting: TU01LAI01
extracting: HE13MIS01
extracting: GU99NAU02
extracting: ZI15IMP02
extracting: PF04PBA01
extracting: SA20FAN02
extracting: WE09ARR01
extracting: CO03PAL05
extracting: XU15BVI01
extracting: HE18COC02
extracting: MU18NPI01
extracting: MO06PED01
extracting: KR20SAR01
extracting: SA18GBR01
extracting: OS14UCP01
extracting: AB20MEN08
extracting: HE13MIS02
extracting: HE10GUA01
extracting: BO14HTI01
extracting: DE12ANC01
extracting: WA17BAN01
extracting: DR99ABR01
extracting: RO19MAR01
extracting: LI06RAR02
extracting: MU18RED03
extracting: SW99LIG01
extracting: SA16CLA01
extracting: ZI15TAN01
extracting: RE19GBR03
extracting: DR00KSB01
extracting: BO14HTI02
extracting: MU17DOA01
extracting: TA18TAS01
extracting: XU15BVI03
extracting: DU94URV02
extracting: AS05GUA01
extracting: FE09OGA01
extracting: GU99NAU01
extracting: SA20FAN01
extracting: CA13DIA01
extracting: AL16PUR02
extracting: CO03PAL10
extracting: RE19GBR05
extracting: ZI15IMP01
extracting: KR20SAR02
extracting: RO19YUC01
extracting: ST13MAL01
extracting: DR00NBB01
extracting: PF19LAR01
extracting: AL16YUC01
extracting: CO03PAL09
extracting: ZI16ROD02
extracting: AB20MEN05
extracting: SH92PUN01
extracting: KI04MCV01
extracting: AL16PUR01
extracting: CH18YOA02
extracting: DE14DTO04
extracting: AB20MEN04
extracting: DE16RED01
extracting: BA04FIJ02
extracting: CO03PAL06
extracting: JI18GAL01
extracting: CH18YOA01
extracting: RE19GBR04
extracting: DO18DAV01
extracting: GO12SBV01
extracting: CA07FLI01
extracting: SW99LIG02
extracting: MC04PNG01
extracting: CO93TAR01
extracting: RO19PAR01
extracting: CO00MAL01
extracting: MO20WOA01
extracting: AB20MEN01
extracting: QU96ESV01
extracting: DE13HAI01
extracting: LI94SEC01
extracting: ZI15CLE01
extracting: MU18RED02
extracting: ZI08MAY01
extracting: TU01DEP01
extracting: CO03PAL04
extracting: RA19PAI01
extracting: AB15BHB01
extracting: FL18DTO01
extracting: MO20KOI01
extracting: DU94URV01
extracting: CO03PAL08
extracting: WU14CLI01
extracting: ZI14TUR01
extracting: AB20MEN02
extracting: LI99CLI01
extracting: ZI15BUN01
extracting: FE18RUS01
extracting: WU13TON01
extracting: KI14PAR01
extracting: ZI14IFR02
extracting: XU15BVI02
extracting: KI08PAR01
extracting: AB20MEN06
extracting: AB08MEN01
extracting: NU09KIR01
extracting: RI10PBL01
extracting: CA14BUT01
extracting: FL18DTO02
extracting: FL18DTO04
extracting: BA04FIJ01
extracting: HE02GBR01
extracting: GO08BER01
extracting: CA13TUR01
extracting: LI06FIJ01
extracting: HE18COC01
extracting: FL17DTO01
extracting: BO99MOO01
extracting: CH03LOM01
extracting: SA19PAL01
extracting: CH97BVB01
extracting: RA20TAI01
Created time series: 608 entries

In [5]:

Copied!

# for ii in range(len(TS)):
#     if np.any(['climate' in key for key in TS[ii].keys()]):
#         print(TS[ii].keys())
# for ii in range(len(TS)):
#     if np.any(['climate' in key for key in TS[ii].keys()]):
#         print(TS[ii].keys())

Create compact dataframe¶

Create empty dataframe with set of columns for compact dataframe, and populate with the LiPD data

In [6]:

Copied!

col_str=['archiveType', 'dataSetName', 'datasetId', 'geo_meanElev', 'geo_meanLat', 'geo_meanLon', 'geo_siteName', 
         'originalDataUrl', 'paleoData_notes', 'paleoData_variableName',
         'paleoData_archiveSpecies','paleoData_units', 'paleoData_values', 'year']

df_tmp = pd.DataFrame(index=range(len(TS)), columns=col_str)
col_str=['archiveType', 'dataSetName', 'datasetId', 'geo_meanElev', 'geo_meanLat', 'geo_meanLon', 'geo_siteName', 
         'originalDataUrl', 'paleoData_notes', 'paleoData_variableName',
         'paleoData_archiveSpecies','paleoData_units', 'paleoData_values', 'year']

df_tmp = pd.DataFrame(index=range(len(TS)), columns=col_str)

populate dataframe¶

Start by populating paleoData_variableName (paleoData_proxy in dod2k standard terms)

In [7]:

Copied!





# loop over the timeseries and pick those for global temperature analysis
i = 0                                                                                                                
for ts in TS: #for every time series
    # need to filter these variables in the list
    if ts['paleoData_variableName'] not in ['year', 'd18OUncertainty', 'SrCaUncertainty']: #filter out all ts with thee three as the var name
        for name in col_str:     #for each of the 12 main keys, shove the wanted data into the df                                                                                    
            try:
                df_tmp.loc[i, name] = ts[name]                                                                       
            except:
                df_tmp.loc[i, name] = np.nan
    i += 1 
        
# drop the rows with all NaNs (those not for global temperature analysis)
df = df_tmp.dropna(how='all')
# loop over the timeseries and pick those for global temperature analysis
i = 0                                                                                                                
for ts in TS: #for every time series
    # need to filter these variables in the list
    if ts['paleoData_variableName'] not in ['year', 'd18OUncertainty', 'SrCaUncertainty']: #filter out all ts with thee three as the var name
        for name in col_str:     #for each of the 12 main keys, shove the wanted data into the df                                                                                    
            try:
                df_tmp.loc[i, name] = ts[name]                                                                       
            except:
                df_tmp.loc[i, name] = np.nan
    i += 1 
        
# drop the rows with all NaNs (those not for global temperature analysis)
df = df_tmp.dropna(how='all')

Now check that paleoData_variableName has been correctly populated and does not contain NaNs:

In [8]:

Copied!

# double check the variable names we have
set(df['paleoData_variableName'])
# double check the variable names we have
set(df['paleoData_variableName'])

Out[8]:

{'SrCa', 'SrCa_annual', 'd18O', 'd18O_annual', 'd18O_sw', 'd18O_sw_annual'}

Add more metadata to the dataframe, including originalDatabase, yearUnits, interpretation_variable (these are added manually and not from the LiPD files)

In [9]:

Copied!





# KF: adding original dataset name and yearUnits
df.insert(7, 'originalDatabase', ['CoralHydro2k v1.0.1']*len(df))
df.insert(len(df.columns), 'yearUnits', ['CE'] * len(df))
df.insert(1, 'interpretation_variable', ['N/A']*len(df))
df.insert(1, 'interpretation_variableDetail', ['N/A']*len(df))
df.insert(1, 'interpretation_seasonality', ['N/A']*len(df))
df.insert(1, 'interpretation_direction', ['N/A']*len(df))
df.insert(1, 'paleoData_proxy', df['paleoData_variableName'])
# KF: adding original dataset name and yearUnits
df.insert(7, 'originalDatabase', ['CoralHydro2k v1.0.1']*len(df))
df.insert(len(df.columns), 'yearUnits', ['CE'] * len(df))
df.insert(1, 'interpretation_variable', ['N/A']*len(df))
df.insert(1, 'interpretation_variableDetail', ['N/A']*len(df))
df.insert(1, 'interpretation_seasonality', ['N/A']*len(df))
df.insert(1, 'interpretation_direction', ['N/A']*len(df))
df.insert(1, 'paleoData_proxy', df['paleoData_variableName'])

Rename columns to fit naming conventions

In [10]:

Copied!

df = df.rename(columns={'originalDataUrl': 'originalDataURL', 'paleoData_archiveSpecies': 'paleoData_sensorSpecies'})
df = df.rename(columns={'originalDataUrl': 'originalDataURL', 'paleoData_archiveSpecies': 'paleoData_sensorSpecies'})

Assign interpretation_variable based on paleoData_proxy type:

d18O are temperature and moisture sensitive
d18O_sw are misture sensitive (sw=seawater)
SrCa are temperature sensitive

In [11]:

Copied!





# d18O is temperature and moisture
df.loc[np.isin(df['paleoData_proxy'], ['d18O', 'd18O_annual']), 'interpretation_variable']='temperature+moisture' 
df.loc[np.isin(df['paleoData_proxy'], ['d18O', 'd18O_annual']), 'interpretation_variableDetail']='temperature+moisture - manually assigned by DoD2k authors for paleoData_proxy = d18O' 
# d18O_sw is moisture
df.loc[np.isin(df['paleoData_proxy'], ['d18O_sw', 'd18O_sw_annual']), 'interpretation_variable']='moisture' 
df.loc[np.isin(df['paleoData_proxy'], ['d18O_sw', 'd18O_sw_annual']), 'interpretation_variableDetail']='moisture - manually assigned by DoD2k authors for paleoData_proxy = d18O_sw' 
# SrCa is temperature
df.loc[np.isin(df['paleoData_proxy'], ['SrCa', 'SrCa_annual']), 'interpretation_variable']='temperature' 
df.loc[np.isin(df['paleoData_proxy'], ['SrCa', 'SrCa_annual']), 'interpretation_variableDetail']='temperature - manually assigned by DoD2k authors for paleoData_proxy = Sr/Ca' 
# d18O is temperature and moisture
df.loc[np.isin(df['paleoData_proxy'], ['d18O', 'd18O_annual']), 'interpretation_variable']='temperature+moisture' 
df.loc[np.isin(df['paleoData_proxy'], ['d18O', 'd18O_annual']), 'interpretation_variableDetail']='temperature+moisture - manually assigned by DoD2k authors for paleoData_proxy = d18O' 
# d18O_sw is moisture
df.loc[np.isin(df['paleoData_proxy'], ['d18O_sw', 'd18O_sw_annual']), 'interpretation_variable']='moisture' 
df.loc[np.isin(df['paleoData_proxy'], ['d18O_sw', 'd18O_sw_annual']), 'interpretation_variableDetail']='moisture - manually assigned by DoD2k authors for paleoData_proxy = d18O_sw' 
# SrCa is temperature
df.loc[np.isin(df['paleoData_proxy'], ['SrCa', 'SrCa_annual']), 'interpretation_variable']='temperature' 
df.loc[np.isin(df['paleoData_proxy'], ['SrCa', 'SrCa_annual']), 'interpretation_variableDetail']='temperature - manually assigned by DoD2k authors for paleoData_proxy = Sr/Ca' 

Now filter for certain records:

exclude sw data
drop _annual tag
rename SrCa to Sr/Ca to match the standard terminology
rename coral to Coral to match the standard terminology
drop rows with no data

Rename entries according to standard terminology¶

In [12]:

Copied!





import re
# KF: Extract and exclude sw values
df_sw = df[df['paleoData_proxy'].isin(['d18O_sw', 'd18O_sw_annual'])]
df = df[df['paleoData_proxy'].isin(['d18O_sw', 'd18O_sw_annual']) == False]

# KF: Turn annual measurements into regular
df_annual = df[df['paleoData_proxy'].isin(['SrCa_annual', 'd18O_annual'])]
for key in ['paleoData_proxy', 'paleoData_variableName']:
    df [key] = df[key].apply(lambda x: re.match(r'(.*)_annual', x).group(1) if re.match(r'(.*)_annual', x) else x)

    # KF: Replace SrCa with Sr/Ca for concat consistency
    df[key] = df[key].apply(lambda x: 'Sr/Ca' if re.match('SrCa', x) else x)

# KF: Temp cleaning rows with NAN in year
length = len(df['year'])
df = df[df['year'].notna()]
df = df[df['year'].map(lambda x: len(x) > 0)]
df = df[df['paleoData_values'].map(lambda x: not any(pd.isnull(x)))]
print('Number of rows discarded: ', (length - len(df['year'])))


df['archiveType'] = df['archiveType'].replace({'coral': 'Coral'})

# # KF: Make datasetIds unique
# df['datasetId'] = df['datasetId'] + np.array(df.index, dtype = str)
import re
# KF: Extract and exclude sw values
df_sw = df[df['paleoData_proxy'].isin(['d18O_sw', 'd18O_sw_annual'])]
df = df[df['paleoData_proxy'].isin(['d18O_sw', 'd18O_sw_annual']) == False]

# KF: Turn annual measurements into regular
df_annual = df[df['paleoData_proxy'].isin(['SrCa_annual', 'd18O_annual'])]
for key in ['paleoData_proxy', 'paleoData_variableName']:
    df [key] = df[key].apply(lambda x: re.match(r'(.*)_annual', x).group(1) if re.match(r'(.*)_annual', x) else x)

    # KF: Replace SrCa with Sr/Ca for concat consistency
    df[key] = df[key].apply(lambda x: 'Sr/Ca' if re.match('SrCa', x) else x)

# KF: Temp cleaning rows with NAN in year
length = len(df['year'])
df = df[df['year'].notna()]
df = df[df['year'].map(lambda x: len(x) > 0)]
df = df[df['paleoData_values'].map(lambda x: not any(pd.isnull(x)))]
print('Number of rows discarded: ', (length - len(df['year'])))


df['archiveType'] = df['archiveType'].replace({'coral': 'Coral'})

# # KF: Make datasetIds unique
# df['datasetId'] = df['datasetId'] + np.array(df.index, dtype = str)

Number of rows discarded:  0

In [13]:

Copied!





# KF: Type-checking

df = df.astype({'archiveType': str, 'dataSetName': str, 'datasetId': str, 'geo_meanElev': np.float32, 'geo_meanLat': np.float32, 'geo_meanLon': np.float32, 'geo_siteName': str, 
                    'originalDatabase': str, 'originalDataURL': str, 'paleoData_notes': str, 'paleoData_proxy': str, 'paleoData_units': str, 'yearUnits': str})
df['year']             = df['year'].map(lambda x: np.array(x, dtype = np.float32))
df['paleoData_values'] = df['paleoData_values'].map(lambda x: np.array(x, dtype = np.float32))
# KF: Type-checking

df = df.astype({'archiveType': str, 'dataSetName': str, 'datasetId': str, 'geo_meanElev': np.float32, 'geo_meanLat': np.float32, 'geo_meanLon': np.float32, 'geo_siteName': str, 
                    'originalDatabase': str, 'originalDataURL': str, 'paleoData_notes': str, 'paleoData_proxy': str, 'paleoData_units': str, 'yearUnits': str})
df['year']             = df['year'].map(lambda x: np.array(x, dtype = np.float32))
df['paleoData_values'] = df['paleoData_values'].map(lambda x: np.array(x, dtype = np.float32))

Include Common Era data only

In [14]:

Copied!





for ii in df.index:
    year = np.array(df.at[ii, 'year'], dtype=float)
    vals = np.array(df.at[ii, 'paleoData_values'], dtype=float)
    df.at[ii, 'year']             = year[year>=1]
    df.at[ii, 'paleoData_values'] = vals[year>=1]
for ii in df.index:
    year = np.array(df.at[ii, 'year'], dtype=float)
    vals = np.array(df.at[ii, 'paleoData_values'], dtype=float)
    df.at[ii, 'year']             = year[year>=1]
    df.at[ii, 'paleoData_values'] = vals[year>=1]

Note that the datasetId is not unique for each record and thus we added an additional array of strings to make the datasetId unique.

In [15]:

Copied!





#  check that the datasetId is unique 
print(len(df.datasetId.unique()))
# make datasetId unique by simply adding index number
df.datasetId=df.apply(lambda x: x.datasetId.replace('ch2k','ch2k_')+'_'+str(x.name), axis=1)
# check uniqueness - problem solved.
print(len(df.datasetId.unique()))
#  check that the datasetId is unique 
print(len(df.datasetId.unique()))
# make datasetId unique by simply adding index number
df.datasetId=df.apply(lambda x: x.datasetId.replace('ch2k','ch2k_')+'_'+str(x.name), axis=1)
# check uniqueness - problem solved.
print(len(df.datasetId.unique()))

179
272

mask out nans and set fill value, then later drop

Drop missing entries and standardize missing data format¶

In [16]:

Copied!





drop_inds = []
for ii in df.index:
    try:
        year = np.array(df.at[ii, 'year'], dtype=float)
        vals = np.array(df.at[ii, 'paleoData_values'], dtype=float)
        df.at[ii, 'year']             = year[year>=1]
        df.at[ii, 'paleoData_values'] = vals[year>=1]
    except:
        # print
        df.at[ii, 'paleoData_values'] = np.array([utf.convert_to_float(y) for y in df.at[ii, 'paleoData_values']], dtype=float)
        df.at[ii, 'year'] = np.array([utf.convert_to_float(y) for y in df.at[ii, 'year']], dtype=float)
        
        print(f'Converted values in paleoData_values and/or year for {ii}.')
        # drop_inds.append(ii)
# df_compact = df_compact.drop(drop_inds)

# drop all missing values and exclude all-missing-values-rows

for ii in df.index:
    dd   = np.array(df.at[ii, 'paleoData_values'])
    mask = dd==-9999.99
    df.at[ii, 'paleoData_values']=dd[~mask]
    df.at[ii, 'year']=np.array(df.at[ii, 'year'])[~mask]
    
drop_inds = []
for ii, row in enumerate(df.paleoData_values):
    try:
        if len(row)==0:
            print(ii, 'empty row for paleodata_values')
        elif len(df.iloc[ii]['year'])==0:
            print(ii, 'empty row for year')
        elif np.std(row)==0: 
            print(ii, 'std=0')
        elif np.sum(np.diff(row)**2)==0: 
            print(ii, 'diff=0')
        elif np.isnan(np.std(row)):
            print(ii, 'std nan')
        else:
            continue
        if df.index[ii] not in drop_inds: 
            drop_inds += [df.index[ii]]
    except:
        drop_inds+=[df.index[ii]]
    
print(drop_inds)
df = df.drop(index=drop_inds)
drop_inds = []
for ii in df.index:
    try:
        year = np.array(df.at[ii, 'year'], dtype=float)
        vals = np.array(df.at[ii, 'paleoData_values'], dtype=float)
        df.at[ii, 'year']             = year[year>=1]
        df.at[ii, 'paleoData_values'] = vals[year>=1]
    except:
        # print
        df.at[ii, 'paleoData_values'] = np.array([utf.convert_to_float(y) for y in df.at[ii, 'paleoData_values']], dtype=float)
        df.at[ii, 'year'] = np.array([utf.convert_to_float(y) for y in df.at[ii, 'year']], dtype=float)
        
        print(f'Converted values in paleoData_values and/or year for {ii}.')
        # drop_inds.append(ii)
# df_compact = df_compact.drop(drop_inds)

# drop all missing values and exclude all-missing-values-rows

for ii in df.index:
    dd   = np.array(df.at[ii, 'paleoData_values'])
    mask = dd==-9999.99
    df.at[ii, 'paleoData_values']=dd[~mask]
    df.at[ii, 'year']=np.array(df.at[ii, 'year'])[~mask]
    
drop_inds = []
for ii, row in enumerate(df.paleoData_values):
    try:
        if len(row)==0:
            print(ii, 'empty row for paleodata_values')
        elif len(df.iloc[ii]['year'])==0:
            print(ii, 'empty row for year')
        elif np.std(row)==0: 
            print(ii, 'std=0')
        elif np.sum(np.diff(row)**2)==0: 
            print(ii, 'diff=0')
        elif np.isnan(np.std(row)):
            print(ii, 'std nan')
        else:
            continue
        if df.index[ii] not in drop_inds: 
            drop_inds += [df.index[ii]]
    except:
        drop_inds+=[df.index[ii]]
    
print(drop_inds)
df = df.drop(index=drop_inds)

5 std nan
9 std nan
10 std nan
14 std nan
26 std nan
28 std nan
36 std nan
37 std nan
42 std nan
43 std nan
44 std nan
45 std nan
46 std nan
58 std nan
76 std nan
87 std nan
91 std nan
103 std nan
116 std nan
117 std nan
123 std nan
138 std nan
139 std nan
147 std nan
166 std nan
169 std nan
171 std nan
172 std nan
175 std nan
177 std nan
186 std nan
202 std nan
206 std nan
215 std nan
219 std nan
227 std nan
230 std nan
238 std nan
239 std nan
240 std nan
241 std nan
242 std nan
243 std nan
250 std nan
252 std nan
253 std nan
254 std nan
255 std nan
256 std nan
259 std nan
270 std nan
[10, 18, 20, 28, 58, 62, 80, 82, 92, 94, 96, 98, 100, 124, 164, 190, 198, 224, 252, 254, 268, 298, 300, 318, 364, 370, 376, 378, 384, 388, 406, 444, 454, 474, 484, 502, 508, 528, 530, 532, 534, 536, 538, 556, 560, 562, 564, 568, 570, 580, 604]

Now show the final compact dataframe

In [17]:

Copied!

df = df[sorted(df.columns)]
df.reset_index(drop= True, inplace= True)
print(df.info())
df = df[sorted(df.columns)]
df.reset_index(drop= True, inplace= True)
print(df.info())

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 221 entries, 0 to 220
Data columns (total 21 columns):
 #   Column                         Non-Null Count  Dtype  
---  ------                         --------------  -----  
 0   archiveType                    221 non-null    object 
 1   dataSetName                    221 non-null    object 
 2   datasetId                      221 non-null    object 
 3   geo_meanElev                   186 non-null    float32
 4   geo_meanLat                    221 non-null    float32
 5   geo_meanLon                    221 non-null    float32
 6   geo_siteName                   221 non-null    object 
 7   interpretation_direction       221 non-null    object 
 8   interpretation_seasonality     221 non-null    object 
 9   interpretation_variable        221 non-null    object 
 10  interpretation_variableDetail  221 non-null    object 
 11  originalDataURL                221 non-null    object 
 12  originalDatabase               221 non-null    object 
 13  paleoData_notes                221 non-null    object 
 14  paleoData_proxy                221 non-null    object 
 15  paleoData_sensorSpecies        221 non-null    object 
 16  paleoData_units                221 non-null    object 
 17  paleoData_values               221 non-null    object 
 18  paleoData_variableName         221 non-null    object 
 19  year                           221 non-null    object 
 20  yearUnits                      221 non-null    object 
dtypes: float32(3), object(18)
memory usage: 33.8+ KB
None

save compact dataframe¶

save pickle¶

In [18]:

Copied!

# save to a pickle file
df_compact = df[sorted(df.columns)]
df_compact.to_pickle('data/ch2k/ch2k_compact.pkl')
# save to a pickle file
df_compact = df[sorted(df.columns)]
df_compact.to_pickle('data/ch2k/ch2k_compact.pkl')

save csv¶

In [19]:

Copied!

# save to a list of csv files (metadata, data, year)
df_compact.name='ch2k'
utf.write_compact_dataframe_to_csv(df_compact)
# save to a list of csv files (metadata, data, year)
df_compact.name='ch2k'
utf.write_compact_dataframe_to_csv(df_compact)

METADATA: datasetId, archiveType, dataSetName, geo_meanElev, geo_meanLat, geo_meanLon, geo_siteName, interpretation_direction, interpretation_seasonality, interpretation_variable, interpretation_variableDetail, originalDataURL, originalDatabase, paleoData_notes, paleoData_proxy, paleoData_sensorSpecies, paleoData_units, paleoData_variableName, yearUnits
Saved to /home/jupyter-lluecke/dod2k_v2.0/dod2k/data/ch2k/ch2k_compact_%s.csv

In [20]:

Copied!

# load dataframe
df = utf.load_compact_dataframe_from_csv('ch2k')
# load dataframe
df = utf.load_compact_dataframe_from_csv('ch2k')

Visualise dataframe¶

Show spatial distribution of records, show archive and proxy types

In [21]:

Copied!





# count archive types
archive_count = {}
for ii, at in enumerate(set(df['archiveType'])):
    archive_count[at] = df.loc[df['archiveType']==at, 'archiveType'].count()

sort = np.argsort([cc for cc in archive_count.values()])
archives_sorted = np.array([cc for cc in archive_count.keys()])[sort][::-1]

# Specify colour for each archive (smaller archives get grouped into the same colour)
archive_colour, major_archives, other_archives = uplt.get_archive_colours(archives_sorted, archive_count)

fig = uplt.plot_geo_archive_proxy(df, archive_colour)
utf.save_fig(fig, f'geo_{df.name}', dir=df.name)
# count archive types
archive_count = {}
for ii, at in enumerate(set(df['archiveType'])):
    archive_count[at] = df.loc[df['archiveType']==at, 'archiveType'].count()

sort = np.argsort([cc for cc in archive_count.values()])
archives_sorted = np.array([cc for cc in archive_count.keys()])[sort][::-1]

# Specify colour for each archive (smaller archives get grouped into the same colour)
archive_colour, major_archives, other_archives = uplt.get_archive_colours(archives_sorted, archive_count)

fig = uplt.plot_geo_archive_proxy(df, archive_colour)
utf.save_fig(fig, f'geo_{df.name}', dir=df.name)

0 Coral 221
saved figure in /home/jupyter-lluecke/dod2k_v2.0/dod2k/figs/ch2k/geo_ch2k.pdf

No description has been provided for this image

Now plot the coverage over the Common Era

In [22]:

Copied!

fig = uplt.plot_coverage(df, archives_sorted, major_archives, other_archives, archive_colour)
utf.save_fig(fig, f'time_{df.name}', dir=df.name)
fig = uplt.plot_coverage(df, archives_sorted, major_archives, other_archives, archive_colour)
utf.save_fig(fig, f'time_{df.name}', dir=df.name)

saved figure in /home/jupyter-lluecke/dod2k_v2.0/dod2k/figs/ch2k/time_ch2k.pdf

Display dataframe¶

Display identification metadata: dataSetName, datasetId, originalDataURL, originalDatabase¶

index¶

In [23]:

Copied!

# # check index
print(df.index)
# # check index
print(df.index)

RangeIndex(start=0, stop=221, step=1)

dataSetName (associated with each record, may not be unique)¶

In [24]:

Copied!





# # check dataSetName
key = 'dataSetName'
print('%s: '%key)
print(df[key].values)
print(np.unique([str(type(dd)) for dd in df[key]]))
print(f'No. of unique values: {len(np.unique(df[key]))}/{len(df)}')
# # check dataSetName
key = 'dataSetName'
print('%s: '%key)
print(df[key].values)
print(np.unique([str(type(dd)) for dd in df[key]]))
print(f'No. of unique values: {len(np.unique(df[key]))}/{len(df)}')

dataSetName: 
['CH03BUN01' 'ZI15MER01' 'ZI15MER01' 'CO03PAL03' 'CO03PAL02' 'LI06RAR01'
 'CO03PAL07' 'FL18DTO03' 'UR00MAI01' 'TU95MAD01' 'ZI04IFR01' 'RE18CAY01'
 'RE18CAY01' 'RE18CAY01' 'RE18CAY01' 'KU99HOU01' 'OS13NLP01' 'EV98KIR01'
 'LI00RAR01' 'LI00RAR01' 'NU11PAL01' 'NU11PAL01' 'MA08DTO01' 'CA14TIM01'
 'CA14TIM01' 'KA17RYU01' 'MC11KIR01' 'AB20MEN09' 'HE08LRA01' 'DA06MAF01'
 'NA09MAL01' 'SW98STP01' 'MU18GSI01' 'MU18GSI01' 'FL17DTO02' 'DA06MAF02'
 'SA19PAL02' 'SA19PAL02' 'CO03PAL01' 'ZI16ROD01' 'OS13NGP01' 'CH98PIR01'
 'RE19GBR02' 'RE19GBR02' 'MU18RED04' 'GR13MAD01' 'XI17HAI01' 'XI17HAI01'
 'XI17HAI01' 'XI17HAI01' 'DE14DTO03' 'KL97DAH01' 'QU06RAB01' 'QU06RAB01'
 'DE14DTO01' 'KU00NIN01' 'TU01SIA01' 'RE19GBR01' 'RE19GBR01' 'GR13MAD02'
 'AB20MEN07' 'BR19RED01' 'NU09FAN01' 'NU09FAN01' 'MU18RED01' 'OS14RIP01'
 'DE14DTO02' 'LI04FIJ01' 'LI04FIJ01' 'EV18ROC01' 'EV18ROC01' 'CA13SAP01'
 'TU01LAI01' 'HE13MIS01' 'HE13MIS01' 'ZI15IMP02' 'ZI15IMP02' 'PF04PBA01'
 'SA20FAN02' 'WE09ARR01' 'WE09ARR01' 'CO03PAL05' 'XU15BVI01' 'HE18COC02'
 'HE18COC02' 'MU18NPI01' 'MO06PED01' 'KR20SAR01' 'KR20SAR01' 'SA18GBR01'
 'OS14UCP01' 'AB20MEN08' 'HE13MIS02' 'HE13MIS02' 'HE10GUA01' 'HE10GUA01'
 'HE10GUA01' 'HE10GUA01' 'DE12ANC01' 'WA17BAN01' 'WA17BAN01' 'DR99ABR01'
 'DR99ABR01' 'LI06RAR02' 'MU18RED03' 'SW99LIG01' 'SA16CLA01' 'ZI15TAN01'
 'ZI15TAN01' 'RE19GBR03' 'RE19GBR03' 'DR00KSB01' 'BO14HTI02' 'BO14HTI02'
 'MU17DOA01' 'TA18TAS01' 'XU15BVI03' 'AS05GUA01' 'FE09OGA01' 'FE09OGA01'
 'FE09OGA01' 'FE09OGA01' 'GU99NAU01' 'SA20FAN01' 'AL16PUR02' 'CO03PAL10'
 'RE19GBR05' 'ZI15IMP01' 'ZI15IMP01' 'KR20SAR02' 'KR20SAR02' 'RO19YUC01'
 'RO19YUC01' 'ST13MAL01' 'ST13MAL01' 'DR00NBB01' 'PF19LAR01' 'PF19LAR01'
 'AL16YUC01' 'CO03PAL09' 'ZI16ROD02' 'AB20MEN05' 'KI04MCV01' 'KI04MCV01'
 'CH18YOA02' 'DE16RED01' 'BA04FIJ02' 'CO03PAL06' 'CH18YOA01' 'RE19GBR04'
 'DO18DAV01' 'GO12SBV01' 'GO12SBV01' 'CA07FLI01' 'CA07FLI01' 'SW99LIG02'
 'CO93TAR01' 'RO19PAR01' 'CO00MAL01' 'MO20WOA01' 'MO20WOA01' 'AB20MEN01'
 'QU96ESV01' 'DE13HAI01' 'DE13HAI01' 'DE13HAI01' 'DE13HAI01' 'LI94SEC01'
 'ZI15CLE01' 'ZI15CLE01' 'MU18RED02' 'ZI08MAY01' 'TU01DEP01' 'CO03PAL04'
 'RA19PAI01' 'AB15BHB01' 'FL18DTO01' 'MO20KOI01' 'MO20KOI01' 'DU94URV01'
 'DU94URV01' 'CO03PAL08' 'WU14CLI01' 'ZI14TUR01' 'ZI14TUR01' 'LI99CLI01'
 'ZI15BUN01' 'ZI15BUN01' 'FE18RUS01' 'FE18RUS01' 'FE18RUS01' 'FE18RUS01'
 'WU13TON01' 'WU13TON01' 'KI14PAR01' 'KI14PAR01' 'KI14PAR01' 'KI14PAR01'
 'ZI14IFR02' 'ZI14IFR02' 'XU15BVI02' 'NU09KIR01' 'NU09KIR01' 'RI10PBL01'
 'CA14BUT01' 'CA14BUT01' 'FL18DTO02' 'BA04FIJ01' 'GO08BER01' 'GO08BER01'
 'LI06FIJ01' 'HE18COC01' 'HE18COC01' 'FL17DTO01' 'FL17DTO01' 'BO99MOO01'
 'CH03LOM01' 'SA19PAL01' 'SA19PAL01' 'CH97BVB01' 'RA20TAI01']
["<class 'str'>"]
No. of unique values: 155/221

datasetId (unique identifier, as given by original authors, includes original database token)¶

In [25]:

Copied!





# # check datasetId

print(len(df.datasetId.unique()))
print(len(df))
key = 'datasetId'
print('%s (starts with): '%key)
print(df[key].values)
print(np.unique([str(type(dd)) for dd in df[key]]))
print('datasetId starts with: ', np.unique([str(dd.split('_')[0]) for dd in df[key]]))
print(f'No. of unique values: {len(np.unique(df[key]))}/{len(df)}')
# # check datasetId

print(len(df.datasetId.unique()))
print(len(df))
key = 'datasetId'
print('%s (starts with): '%key)
print(df[key].values)
print(np.unique([str(type(dd)) for dd in df[key]]))
print('datasetId starts with: ', np.unique([str(dd.split('_')[0]) for dd in df[key]]))
print(f'No. of unique values: {len(np.unique(df[key]))}/{len(df)}')

221
221
datasetId (starts with): 
['ch2k_CH03BUN01_0' 'ch2k_ZI15MER01_2' 'ch2k_ZI15MER01_4'
 'ch2k_CO03PAL03_6' 'ch2k_CO03PAL02_8' 'ch2k_LI06RAR01_12'
 'ch2k_CO03PAL07_14' 'ch2k_FL18DTO03_16' 'ch2k_UR00MAI01_22'
 'ch2k_TU95MAD01_24' 'ch2k_ZI04IFR01_26' 'ch2k_RE18CAY01_30'
 'ch2k_RE18CAY01_32' 'ch2k_RE18CAY01_34' 'ch2k_RE18CAY01_36'
 'ch2k_KU99HOU01_40' 'ch2k_OS13NLP01_42' 'ch2k_EV98KIR01_44'
 'ch2k_LI00RAR01_46' 'ch2k_LI00RAR01_48' 'ch2k_NU11PAL01_52'
 'ch2k_NU11PAL01_54' 'ch2k_MA08DTO01_60' 'ch2k_CA14TIM01_64'
 'ch2k_CA14TIM01_66' 'ch2k_KA17RYU01_70' 'ch2k_MC11KIR01_72'
 'ch2k_AB20MEN09_74' 'ch2k_HE08LRA01_76' 'ch2k_DA06MAF01_78'
 'ch2k_NA09MAL01_84' 'ch2k_SW98STP01_86' 'ch2k_MU18GSI01_88'
 'ch2k_MU18GSI01_90' 'ch2k_FL17DTO02_102' 'ch2k_DA06MAF02_104'
 'ch2k_SA19PAL02_106' 'ch2k_SA19PAL02_108' 'ch2k_CO03PAL01_110'
 'ch2k_ZI16ROD01_112' 'ch2k_OS13NGP01_114' 'ch2k_CH98PIR01_116'
 'ch2k_RE19GBR02_118' 'ch2k_RE19GBR02_120' 'ch2k_MU18RED04_122'
 'ch2k_GR13MAD01_126' 'ch2k_XI17HAI01_128' 'ch2k_XI17HAI01_130'
 'ch2k_XI17HAI01_134' 'ch2k_XI17HAI01_136' 'ch2k_DE14DTO03_140'
 'ch2k_KL97DAH01_142' 'ch2k_QU06RAB01_144' 'ch2k_QU06RAB01_146'
 'ch2k_DE14DTO01_148' 'ch2k_KU00NIN01_150' 'ch2k_TU01SIA01_152'
 'ch2k_RE19GBR01_154' 'ch2k_RE19GBR01_156' 'ch2k_GR13MAD02_158'
 'ch2k_AB20MEN07_160' 'ch2k_BR19RED01_162' 'ch2k_NU09FAN01_166'
 'ch2k_NU09FAN01_168' 'ch2k_MU18RED01_172' 'ch2k_OS14RIP01_174'
 'ch2k_DE14DTO02_176' 'ch2k_LI04FIJ01_178' 'ch2k_LI04FIJ01_180'
 'ch2k_EV18ROC01_184' 'ch2k_EV18ROC01_186' 'ch2k_CA13SAP01_188'
 'ch2k_TU01LAI01_192' 'ch2k_HE13MIS01_194' 'ch2k_HE13MIS01_196'
 'ch2k_ZI15IMP02_200' 'ch2k_ZI15IMP02_202' 'ch2k_PF04PBA01_204'
 'ch2k_SA20FAN02_206' 'ch2k_WE09ARR01_208' 'ch2k_WE09ARR01_210'
 'ch2k_CO03PAL05_212' 'ch2k_XU15BVI01_214' 'ch2k_HE18COC02_216'
 'ch2k_HE18COC02_218' 'ch2k_MU18NPI01_222' 'ch2k_MO06PED01_226'
 'ch2k_KR20SAR01_228' 'ch2k_KR20SAR01_230' 'ch2k_SA18GBR01_234'
 'ch2k_OS14UCP01_236' 'ch2k_AB20MEN08_238' 'ch2k_HE13MIS02_240'
 'ch2k_HE13MIS02_242' 'ch2k_HE10GUA01_244' 'ch2k_HE10GUA01_246'
 'ch2k_HE10GUA01_248' 'ch2k_HE10GUA01_250' 'ch2k_DE12ANC01_258'
 'ch2k_WA17BAN01_260' 'ch2k_WA17BAN01_262' 'ch2k_DR99ABR01_264'
 'ch2k_DR99ABR01_266' 'ch2k_LI06RAR02_270' 'ch2k_MU18RED03_272'
 'ch2k_SW99LIG01_274' 'ch2k_SA16CLA01_276' 'ch2k_ZI15TAN01_278'
 'ch2k_ZI15TAN01_280' 'ch2k_RE19GBR03_282' 'ch2k_RE19GBR03_284'
 'ch2k_DR00KSB01_286' 'ch2k_BO14HTI02_288' 'ch2k_BO14HTI02_290'
 'ch2k_MU17DOA01_292' 'ch2k_TA18TAS01_294' 'ch2k_XU15BVI03_296'
 'ch2k_AS05GUA01_302' 'ch2k_FE09OGA01_304' 'ch2k_FE09OGA01_306'
 'ch2k_FE09OGA01_308' 'ch2k_FE09OGA01_310' 'ch2k_GU99NAU01_314'
 'ch2k_SA20FAN01_316' 'ch2k_AL16PUR02_320' 'ch2k_CO03PAL10_324'
 'ch2k_RE19GBR05_326' 'ch2k_ZI15IMP01_328' 'ch2k_ZI15IMP01_330'
 'ch2k_KR20SAR02_332' 'ch2k_KR20SAR02_334' 'ch2k_RO19YUC01_338'
 'ch2k_RO19YUC01_340' 'ch2k_ST13MAL01_344' 'ch2k_ST13MAL01_346'
 'ch2k_DR00NBB01_348' 'ch2k_PF19LAR01_350' 'ch2k_PF19LAR01_352'
 'ch2k_AL16YUC01_354' 'ch2k_CO03PAL09_358' 'ch2k_ZI16ROD02_360'
 'ch2k_AB20MEN05_362' 'ch2k_KI04MCV01_366' 'ch2k_KI04MCV01_368'
 'ch2k_CH18YOA02_374' 'ch2k_DE16RED01_380' 'ch2k_BA04FIJ02_382'
 'ch2k_CO03PAL06_386' 'ch2k_CH18YOA01_390' 'ch2k_RE19GBR04_392'
 'ch2k_DO18DAV01_394' 'ch2k_GO12SBV01_396' 'ch2k_GO12SBV01_398'
 'ch2k_CA07FLI01_400' 'ch2k_CA07FLI01_402' 'ch2k_SW99LIG02_404'
 'ch2k_CO93TAR01_408' 'ch2k_RO19PAR01_410' 'ch2k_CO00MAL01_412'
 'ch2k_MO20WOA01_414' 'ch2k_MO20WOA01_416' 'ch2k_AB20MEN01_420'
 'ch2k_QU96ESV01_422' 'ch2k_DE13HAI01_424' 'ch2k_DE13HAI01_426'
 'ch2k_DE13HAI01_430' 'ch2k_DE13HAI01_432' 'ch2k_LI94SEC01_436'
 'ch2k_ZI15CLE01_438' 'ch2k_ZI15CLE01_440' 'ch2k_MU18RED02_442'
 'ch2k_ZI08MAY01_446' 'ch2k_TU01DEP01_450' 'ch2k_CO03PAL04_452'
 'ch2k_RA19PAI01_456' 'ch2k_AB15BHB01_458' 'ch2k_FL18DTO01_460'
 'ch2k_MO20KOI01_462' 'ch2k_MO20KOI01_464' 'ch2k_DU94URV01_468'
 'ch2k_DU94URV01_470' 'ch2k_CO03PAL08_472' 'ch2k_WU14CLI01_476'
 'ch2k_ZI14TUR01_480' 'ch2k_ZI14TUR01_482' 'ch2k_LI99CLI01_486'
 'ch2k_ZI15BUN01_488' 'ch2k_ZI15BUN01_490' 'ch2k_FE18RUS01_492'
 'ch2k_FE18RUS01_494' 'ch2k_FE18RUS01_496' 'ch2k_FE18RUS01_498'
 'ch2k_WU13TON01_504' 'ch2k_WU13TON01_506' 'ch2k_KI14PAR01_510'
 'ch2k_KI14PAR01_512' 'ch2k_KI14PAR01_516' 'ch2k_KI14PAR01_518'
 'ch2k_ZI14IFR02_522' 'ch2k_ZI14IFR02_524' 'ch2k_XU15BVI02_526'
 'ch2k_NU09KIR01_540' 'ch2k_NU09KIR01_542' 'ch2k_RI10PBL01_546'
 'ch2k_CA14BUT01_548' 'ch2k_CA14BUT01_550' 'ch2k_FL18DTO02_554'
 'ch2k_BA04FIJ01_558' 'ch2k_GO08BER01_572' 'ch2k_GO08BER01_574'
 'ch2k_LI06FIJ01_582' 'ch2k_HE18COC01_584' 'ch2k_HE18COC01_586'
 'ch2k_FL17DTO01_590' 'ch2k_FL17DTO01_592' 'ch2k_BO99MOO01_594'
 'ch2k_CH03LOM01_596' 'ch2k_SA19PAL01_598' 'ch2k_SA19PAL01_600'
 'ch2k_CH97BVB01_602' 'ch2k_RA20TAI01_606']
["<class 'str'>"]
datasetId starts with:  ['ch2k']
No. of unique values: 221/221

originalDataURL (URL/DOI of original published record where available)¶

In [26]:

Copied!





# originalDataURL
key = 'originalDataURL'
print('%s: '%key)
print(np.unique([kk for kk in df[key]]))
print(np.unique([kk for kk in df[key] if 'this' in kk]))
print(np.unique([str(type(dd)) for dd in df[key]]))
# 'this study' should point to the correct URL (PAGES2k)
print(f'No. of unique values: {len(np.unique(df[key]))}/{len(df)}')
# originalDataURL
key = 'originalDataURL'
print('%s: '%key)
print(np.unique([kk for kk in df[key]]))
print(np.unique([kk for kk in df[key] if 'this' in kk]))
print(np.unique([str(type(dd)) for dd in df[key]]))
# 'this study' should point to the correct URL (PAGES2k)
print(f'No. of unique values: {len(np.unique(df[key]))}/{len(df)}')

originalDataURL: 
['https://doi.org/10.1594/PANGAEA.874078'
 'https://doi.pangaea.de/10.1594/PANGAEA.743953'
 'https://doi.pangaea.de/10.1594/PANGAEA.830601'
 'https://doi.pangaea.de/10.1594/PANGAEA.88199'
 'https://doi.pangaea.de/10.1594/PANGAEA.88200'
 'https://doi.pangaea.de/10.1594/PANGAEA.887712'
 'https://doi.pangaea.de/10.1594/PANGAEA.891094'
 'https://www.ncdc.noaa.gov/paleo/study/1003972'
 'https://www.ncdc.noaa.gov/paleo/study/1003973'
 'https://www.ncdc.noaa.gov/paleo/study/10373'
 'https://www.ncdc.noaa.gov/paleo/study/10425'
 'https://www.ncdc.noaa.gov/paleo/study/10808'
 'https://www.ncdc.noaa.gov/paleo/study/11935'
 'https://www.ncdc.noaa.gov/paleo/study/12278'
 'https://www.ncdc.noaa.gov/paleo/study/12891'
 'https://www.ncdc.noaa.gov/paleo/study/12893'
 'https://www.ncdc.noaa.gov/paleo/study/12994'
 'https://www.ncdc.noaa.gov/paleo/study/13035'
 'https://www.ncdc.noaa.gov/paleo/study/13439'
 'https://www.ncdc.noaa.gov/paleo/study/15238'
 'https://www.ncdc.noaa.gov/paleo/study/15794'
 'https://www.ncdc.noaa.gov/paleo/study/16217'
 'https://www.ncdc.noaa.gov/paleo/study/16338'
 'https://www.ncdc.noaa.gov/paleo/study/16339'
 'https://www.ncdc.noaa.gov/paleo/study/16438'
 'https://www.ncdc.noaa.gov/paleo/study/17035'
 'https://www.ncdc.noaa.gov/paleo/study/17289'
 'https://www.ncdc.noaa.gov/paleo/study/17378'
 'https://www.ncdc.noaa.gov/paleo/study/1839'
 'https://www.ncdc.noaa.gov/paleo/study/1842'
 'https://www.ncdc.noaa.gov/paleo/study/1844'
 'https://www.ncdc.noaa.gov/paleo/study/1845'
 'https://www.ncdc.noaa.gov/paleo/study/1846'
 'https://www.ncdc.noaa.gov/paleo/study/1847'
 'https://www.ncdc.noaa.gov/paleo/study/1850'
 'https://www.ncdc.noaa.gov/paleo/study/1853'
 'https://www.ncdc.noaa.gov/paleo/study/1855'
 'https://www.ncdc.noaa.gov/paleo/study/1856'
 'https://www.ncdc.noaa.gov/paleo/study/1857'
 'https://www.ncdc.noaa.gov/paleo/study/1859'
 'https://www.ncdc.noaa.gov/paleo/study/1866'
 'https://www.ncdc.noaa.gov/paleo/study/1867'
 'https://www.ncdc.noaa.gov/paleo/study/1875'
 'https://www.ncdc.noaa.gov/paleo/study/1876'
 'https://www.ncdc.noaa.gov/paleo/study/1881'
 'https://www.ncdc.noaa.gov/paleo/study/18895'
 'https://www.ncdc.noaa.gov/paleo/study/1891'
 'https://www.ncdc.noaa.gov/paleo/study/1897'
 'https://www.ncdc.noaa.gov/paleo/study/1901'
 'https://www.ncdc.noaa.gov/paleo/study/1903'
 'https://www.ncdc.noaa.gov/paleo/study/1911'
 'https://www.ncdc.noaa.gov/paleo/study/1913'
 'https://www.ncdc.noaa.gov/paleo/study/1914'
 'https://www.ncdc.noaa.gov/paleo/study/1915'
 'https://www.ncdc.noaa.gov/paleo/study/19179'
 'https://www.ncdc.noaa.gov/paleo/study/19239'
 'https://www.ncdc.noaa.gov/paleo/study/1925'
 'https://www.ncdc.noaa.gov/paleo/study/21011'
 'https://www.ncdc.noaa.gov/paleo/study/21310'
 'https://www.ncdc.noaa.gov/paleo/study/21710'
 'https://www.ncdc.noaa.gov/paleo/study/22056'
 'https://www.ncdc.noaa.gov/paleo/study/22252'
 'https://www.ncdc.noaa.gov/paleo/study/22991'
 'https://www.ncdc.noaa.gov/paleo/study/23390'
 'https://www.ncdc.noaa.gov/paleo/study/23850'
 'https://www.ncdc.noaa.gov/paleo/study/24477'
 'https://www.ncdc.noaa.gov/paleo/study/24630'
 'https://www.ncdc.noaa.gov/paleo/study/25270'
 'https://www.ncdc.noaa.gov/paleo/study/25290'
 'https://www.ncdc.noaa.gov/paleo/study/26531'
 'https://www.ncdc.noaa.gov/paleo/study/27271'
 'https://www.ncdc.noaa.gov/paleo/study/27450'
 'https://www.ncdc.noaa.gov/paleo/study/28130'
 'https://www.ncdc.noaa.gov/paleo/study/28451'
 'https://www.ncdc.noaa.gov/paleo/study/29312'
 'https://www.ncdc.noaa.gov/paleo/study/29412'
 'https://www.ncdc.noaa.gov/paleo/study/30493'
 'https://www.ncdc.noaa.gov/paleo/study/31552'
 'https://www.ncdc.noaa.gov/paleo/study/33732'
 'https://www.ncdc.noaa.gov/paleo/study/34372'
 'https://www.ncdc.noaa.gov/paleo/study/34373'
 'https://www.ncdc.noaa.gov/paleo/study/34392'
 'https://www.ncdc.noaa.gov/paleo/study/34393'
 'https://www.ncdc.noaa.gov/paleo/study/34394'
 'https://www.ncdc.noaa.gov/paleo/study/34412'
 'https://www.ncdc.noaa.gov/paleo/study/34413'
 'https://www.ncdc.noaa.gov/paleo/study/34452'
 'https://www.ncdc.noaa.gov/paleo/study/34472'
 'https://www.ncdc.noaa.gov/paleo/study/34512'
 'https://www.ncdc.noaa.gov/paleo/study/34552'
 'https://www.ncdc.noaa.gov/paleo/study/34553'
 'https://www.ncdc.noaa.gov/paleo/study/34612'
 'https://www.ncdc.noaa.gov/paleo/study/34692'
 'https://www.ncdc.noaa.gov/paleo/study/34953'
 'https://www.ncdc.noaa.gov/paleo/study/6087'
 'https://www.ncdc.noaa.gov/paleo/study/6089'
 'https://www.ncdc.noaa.gov/paleo/study/6116'
 'https://www.ncdc.noaa.gov/paleo/study/6184'
 'https://www.ncdc.noaa.gov/paleo/study/8424'
 'https://www.ncdc.noaa.gov/paleo/study/8609'
 'https://www.ncdc.noaa.gov/paleo/study/9639']
[]
["<class 'str'>"]
No. of unique values: 101/221

originalDatabase (original database used as input for dataframe)¶

In [27]:

Copied!





# # originalDataSet
key = 'originalDatabase'
print('%s: '%key)
print(np.unique([kk for kk in df[key]]))
print(np.unique([str(type(dd)) for dd in df[key]]))
# Note: the last two records have missing URLs
print(f'No. of unique values: {len(np.unique(df[key]))}/{len(df)}')
# # originalDataSet
key = 'originalDatabase'
print('%s: '%key)
print(np.unique([kk for kk in df[key]]))
print(np.unique([str(type(dd)) for dd in df[key]]))
# Note: the last two records have missing URLs
print(f'No. of unique values: {len(np.unique(df[key]))}/{len(df)}')

originalDatabase: 
['CoralHydro2k v1.0.1']
["<class 'str'>"]
No. of unique values: 1/221

geographical metadata: elevation, latitude, longitude, site name¶

geo_meanElev (mean elevation in m)¶

In [28]:

Copied!





# check Elevation
key = 'geo_meanElev'
print('%s: '%key)
print(df[key])
print(np.unique(['%d'%kk for kk in df[key] if np.isfinite(kk)]))
print(np.unique([str(type(dd)) for dd in df[key]]))
print(f'No. of unique values: {len(np.unique(df[key]))}/{len(df)}')
# check Elevation
key = 'geo_meanElev'
print('%s: '%key)
print(df[key])
print(np.unique(['%d'%kk for kk in df[key] if np.isfinite(kk)]))
print(np.unique([str(type(dd)) for dd in df[key]]))
print(f'No. of unique values: {len(np.unique(df[key]))}/{len(df)}')

geo_meanElev: 
0      -3.0
1     -17.0
2     -17.0
3       NaN
4       NaN
       ... 
216    -5.0
217   -10.0
218   -10.0
219    -7.0
220    -6.0
Name: geo_meanElev, Length: 221, dtype: float32
['-1' '-10' '-11' '-12' '-14' '-16' '-17' '-18' '-2' '-25' '-3' '-4' '-5'
 '-6' '-7' '-8' '-9' '0']
["<class 'float'>"]
No. of unique values: 44/221

geo_meanLat (mean latitude in degrees N)¶

In [29]:

Copied!





# # Latitude
key = 'geo_meanLat'
print('%s: '%key)
print(np.unique(['%d'%kk for kk in df[key]]))
print(np.unique([str(type(dd)) for dd in df[key]]))
print(f'No. of unique values: {len(np.unique(df[key]))}/{len(df)}')
# # Latitude
key = 'geo_meanLat'
print('%s: '%key)
print(np.unique(['%d'%kk for kk in df[key]]))
print(np.unique([str(type(dd)) for dd in df[key]]))
print(f'No. of unique values: {len(np.unique(df[key]))}/{len(df)}')

geo_meanLat: 
['-10' '-11' '-12' '-13' '-14' '-15' '-16' '-17' '-18' '-19' '-21' '-22'
 '-23' '-28' '-3' '-4' '-5' '-6' '-8' '0' '1' '10' '11' '12' '13' '15'
 '16' '17' '18' '19' '2' '20' '21' '22' '23' '24' '25' '27' '28' '3' '32'
 '4' '5' '7']
["<class 'float'>"]
No. of unique values: 128/221

geo_meanLon (mean longitude)¶

In [30]:

Copied!





# # Longitude 
key = 'geo_meanLon'
print('%s: '%key)
print(np.unique(['%d'%kk for kk in df[key]]))
print(np.unique([str(type(dd)) for dd in df[key]]))
print(f'No. of unique values: {len(np.unique(df[key]))}/{len(df)}')
# # Longitude 
key = 'geo_meanLon'
print('%s: '%key)
print(np.unique(['%d'%kk for kk in df[key]]))
print(np.unique([str(type(dd)) for dd in df[key]]))
print(f'No. of unique values: {len(np.unique(df[key]))}/{len(df)}')

geo_meanLon: 
['-109' '-114' '-149' '-157' '-159' '-162' '-169' '-174' '-22' '-33' '-61'
 '-64' '-66' '-67' '-80' '-82' '-86' '-88' '-91' '100' '105' '109' '110'
 '111' '113' '114' '115' '117' '118' '119' '120' '122' '123' '124' '130'
 '134' '142' '143' '144' '145' '146' '147' '148' '150' '151' '152' '153'
 '163' '166' '167' '172' '173' '179' '34' '36' '37' '38' '39' '40' '43'
 '45' '49' '55' '58' '63' '7' '70' '71' '72' '92' '96']
["<class 'float'>"]
No. of unique values: 130/221

geo_siteName (name of collection site)¶

In [31]:

Copied!





# Site Name 
key = 'geo_siteName'
print('%s: '%key)
print(df[key].values)
print(np.unique([str(type(dd)) for dd in df[key]]))
print(f'No. of unique values: {len(np.unique(df[key]))}/{len(df)}')
# Site Name 
key = 'geo_siteName'
print('%s: '%key)
print(df[key].values)
print(np.unique([str(type(dd)) for dd in df[key]]))
print(f'No. of unique values: {len(np.unique(df[key]))}/{len(df)}')

geo_siteName: 
['Bunaken Island, Indonesia' 'Rowley Shoals, Australia'
 'Rowley Shoals, Australia'
 'Palmyra Island, United States Minor Outlying Islands'
 'Palmyra Island, United States Minor Outlying Islands'
 'Rarotonga, Cook Islands'
 'Palmyra Island, United States Minor Outlying Islands'
 'Dry Tortugas, Florida, USA' 'Maiana, Republic of Kiribati'
 'Madang Lagoon, Papua New Guinea' 'Ifaty Reef, Madagascar'
 'Little Cayman, Cayman Islands' 'Little Cayman, Cayman Islands'
 'Little Cayman, Cayman Islands' 'Little Cayman, Cayman Islands'
 'Houtman Abrolhos Islands, Australia' 'Ngeralang, Palau'
 'Kiritimati (Christmas) Island, Republic of Kiribati'
 'Rarotonga, Cook Islands' 'Rarotonga, Cook Islands'
 'Palmyra Island, United States Minor Outlying Islands'
 'Palmyra Island, United States Minor Outlying Islands'
 'Dry Tortugas, Florida, USA' 'Timor, Indonesia' 'Timor, Indonesia'
 'Kikai Island, Japan'
 'Kiritimati (Christmas) Island, Republic of Kiribati'
 'Mentawai Islands, Indonesia'
 'Cayo Sal, Los Roques Archipelago, Venezuela'
 'Fungu Mrima Reef, Tanzania' 'Malindi Marine Park, Kenya'
 'Ponta Banana, Principe Island' 'Gili Selang, Bali, Indonesia'
 'Gili Selang, Bali, Indonesia' 'Dry Tortugas, Florida, USA'
 'Fungu Mrima Reef, Tanzania'
 'Palmyra Island, United States Minor Outlying Islands'
 'Palmyra Island, United States Minor Outlying Islands'
 'Palmyra Island, United States Minor Outlying Islands'
 'Rodrigues, Republic of Mauritius' 'Ngaragabel, Palau'
 'Pirotan Island, Gujarat, India' 'Portland Roads, Australia'
 'Portland Roads, Australia' 'Coral Gardens, Red Sea'
 'Nosy Boraha, Madagascar (formerly Ile Sainte-Marie)'
 'Fengjiawan, Wenchang, China' 'Fengjiawan, Wenchang, China'
 'Fengjiawan, Wenchang, China' 'Fengjiawan, Wenchang, China'
 'Dry Tortugas, Florida, USA' 'Dur-Ghella Island, Eritrea'
 'Rabaul, East New Britain, Papua New Guinea'
 'Rabaul, East New Britain, Papua New Guinea' 'Dry Tortugas, Florida, USA'
 'Ningaloo Reef, Australia' 'Sialum, Huon Peninsula, Papua New Guinea'
 'Eel Reef, Australia' 'Eel Reef, Australia'
 'Nosy Boraha, Madagascar (formerly Ile Sainte-Marie)'
 'Mentawai Islands, Indonesia' 'Canyon, Red Sea'
 'Tabuaeran (Fanning Island), Republic of Kiribati'
 'Tabuaeran (Fanning Island), Republic of Kiribati' 'Semicolon, Red Sea'
 'Rock Islands, Palau' 'Dry Tortugas, Florida, USA' 'Vanua Levu, Fiji'
 'Vanua Levu, Fiji' 'Rocas Atoll, Rio Grande do Norte, Brazil'
 'Rocas Atoll, Rio Grande do Norte, Brazil' 'Sapodilla Cayes, Belize'
 'Laing Island, Papua New Guinea' 'Misima Island, Papua New Guinea'
 'Misima Island, Papua New Guinea' 'Rowley Shoals, Australia'
 'Rowley Shoals, Australia' 'Peros Banhos Atoll, Chagos Archipelago'
 'Tabuaeran (Fanning Island), Republic of Kiribati'
 'Arlington Reef, Australia' 'Arlington Reef, Australia'
 'Palmyra Island, United States Minor Outlying Islands'
 'Anegada, British Virgin Islands' 'Cocos (Keeling) Islands, Australia'
 'Cocos (Keeling) Islands, Australia' 'Nusa Penida, Indonesia'
 'Pedra de Lume, Sal Island' 'Sarawak, Malaysia' 'Sarawak, Malaysia'
 'Great Keppel Island, Australia' 'Ulong Channel, Palau'
 'Mentawai Islands, Indonesia' 'Misima Island, Papua New Guinea'
 'Misima Island, Papua New Guinea' 'Isle de Gosier, Guadeloupe'
 'Isle de Gosier, Guadeloupe' 'Isle de Gosier, Guadeloupe'
 'Isle de Gosier, Guadeloupe' 'Amedee Island, New Caledonia'
 'Bandar Khayran, Oman' 'Bandar Khayran, Oman' 'Abraham Reef, Australia'
 'Abraham Reef, Australia' 'Rarotonga, Cook Islands' 'Abu Galawa, Red Sea'
 'Lignumvitae Basin, Florida, USA' 'Clarion Island, Mexico'
 'Ningaloo Reef, Australia' 'Ningaloo Reef, Australia'
 'Reef 13-050, Australia' 'Reef 13-050, Australia'
 'Kitchen Shoals, Bermuda' 'Hon Tre Island, Vietnam'
 'Hon Tre Island, Vietnam' 'Doangdoangan Besar, Indonesia'
 "Ta'u, American Samoa" 'Anegada, British Virgin Islands'
 'Double Reef, Guam' 'Ogasawara Islands, Japan' 'Ogasawara Islands, Japan'
 'Ogasawara Islands, Japan' 'Ogasawara Islands, Japan'
 'Nauru Island, Republic of Nauru'
 'Tabuaeran (Fanning Island), Republic of Kiribati'
 'Pinacles Reef, Puerto Rico'
 'Palmyra Island, United States Minor Outlying Islands'
 'Clerke Reef, Australia' 'Rowley Shoals, Australia'
 'Rowley Shoals, Australia' 'Sarawak, Malaysia' 'Sarawak, Malaysia'
 'Puerto Morelos, Mexico' 'Puerto Morelos, Mexico'
 'Rasdhoo Atoll, Maldives' 'Rasdhoo Atoll, Maldives'
 'Northeast Breakers, Bermuda' 'St. Gilles Reef, La Reunion'
 'St. Gilles Reef, La Reunion' 'Puerto Morelos, Mexico'
 'Palmyra Island, United States Minor Outlying Islands'
 'Rodrigues, Republic of Mauritius' 'Mentawai Islands, Indonesia'
 'Espiritu Santo Island, Vanuatu' 'Espiritu Santo Island, Vanuatu'
 'Lingyang Reef, Yongle Atoll' 'Red Sea' 'Savusavu Bay, Vanua Levu, Fiji'
 'Palmyra Island, United States Minor Outlying Islands'
 'Lingyang Reef, Yongle Atoll' 'Nomad Reef, Australia'
 'Davies Reef, Australia' 'Sabine Bank, Vanuatu' 'Sabine Bank, Vanuatu'
 'Flinders Reef, Australia' 'Flinders Reef, Australia'
 'Lignumvitae Basin, Florida Bay' 'Tarawa Atoll, Republic of Kiribati'
 'Parguera, Puerto Rico' 'Malindi Marine Park, Kenya'
 'Wolei Atoll, Fed. States of Micronesia'
 'Wolei Atoll, Fed. States of Micronesia' 'Mentawai Islands, Indonesia'
 'Espiritu Santo Island, Vanuatu' 'Longwan, Qionghai, China'
 'Longwan, Qionghai, China' 'Longwan, Qionghai, China'
 'Longwan, Qionghai, China' 'Secas Island, Panama'
 'Rowley Shoals, Australia' 'Rowley Shoals, Australia'
 'Popponesset, Red Sea' 'Mayotte' 'Madang Lagoon, Papua New Guinea'
 'Palmyra Island, United States Minor Outlying Islands'
 'Palaui Island, Philippines' 'Batu Hitam Beach, Indonesia'
 'Dry Tortugas, Florida, USA' 'Kosrae Island, Fed. States of Micronesia'
 'Kosrae Island, Fed. States of Micronesia'
 'Urvina Bay, Isabela Island, Ecuador'
 'Urvina Bay, Isabela Island, Ecuador'
 'Palmyra Island, United States Minor Outlying Islands'
 'Clipperton Island' 'Tulear Reef, Madagascar' 'Tulear Reef, Madagascar'
 'Clipperton Island' 'Ningaloo Reef, Australia' 'Ningaloo Reef, Australia'
 'Ras Umm Sidd, Egypt' 'Ras Umm Sidd, Egypt' 'Ras Umm Sidd, Egypt'
 'Ras Umm Sidd, Egypt' "Ha'afera, Tonga" "Ha'afera, Tonga"
 'La Parguera, Puerto Rico' 'La Parguera, Puerto Rico'
 'La Parguera, Puerto Rico' 'La Parguera, Puerto Rico'
 'Ifaty Reef, Madagascar' 'Ifaty Reef, Madagascar'
 'Anegada, British Virgin Islands'
 'Kiritimati (Christmas) Island, Republic of Kiribati'
 'Kiritimati (Christmas) Island, Republic of Kiribati'
 'Port Blair, Andaman Islands, India'
 'Butaritari Atoll, Republic of Kiribati'
 'Butaritari Atoll, Republic of Kiribati' 'Dry Tortugas, Florida, USA'
 'Savusavu Bay, Vanua Levu, Fiji' 'Bermuda' 'Bermuda'
 'Savusavu Bay, Vanua Levu, Fiji' 'Cocos (Keeling) Islands, Australia'
 'Cocos (Keeling) Islands, Australia' 'Dry Tortugas, Florida, USA'
 'Dry Tortugas, Florida, USA' 'Moorea, French Polynesia'
 'Padang Bai, Bali, Indonesia'
 'Palmyra Island, United States Minor Outlying Islands'
 'Palmyra Island, United States Minor Outlying Islands'
 'Mahe Island, Republic of the Seychelles' 'Houbihu, Taiwan']
["<class 'str'>"]
No. of unique values: 103/221

proxy metadata: archive type, proxy type, interpretation¶

archiveType (archive type)¶

In [32]:

Copied!





# archiveType
key = 'archiveType'
print('%s: '%key)
print(np.unique(df[key]))
print(np.unique([str(type(dd)) for dd in df[key]]))
print(f'No. of unique values: {len(np.unique(df[key]))}/{len(df)}')
# archiveType
key = 'archiveType'
print('%s: '%key)
print(np.unique(df[key]))
print(np.unique([str(type(dd)) for dd in df[key]]))
print(f'No. of unique values: {len(np.unique(df[key]))}/{len(df)}')

archiveType: 
['Coral']
["<class 'str'>"]
No. of unique values: 1/221

paleoData_proxy (proxy type)¶

In [33]:

Copied!





# paleoData_proxy
key = 'paleoData_proxy'
print('%s: '%key)
print(np.unique([kk for kk in df[key]]))
print(np.unique([str(type(dd)) for dd in df[key]]))
print(f'No. of unique values: {len(np.unique(df[key]))}/{len(df)}')
# paleoData_proxy
key = 'paleoData_proxy'
print('%s: '%key)
print(np.unique([kk for kk in df[key]]))
print(np.unique([str(type(dd)) for dd in df[key]]))
print(f'No. of unique values: {len(np.unique(df[key]))}/{len(df)}')

paleoData_proxy: 
['Sr/Ca' 'd18O']
["<class 'str'>"]
No. of unique values: 2/221

paleoData_sensorSpecies (further information on proxy type: species)¶

In [34]:

Copied!





# climate_interpretation
key = 'paleoData_sensorSpecies'
print('%s: '%key)
print(np.unique([kk for kk in df[key]]))
print(np.unique([str(type(dd)) for dd in df[key]]))
print(f'No. of unique values: {len(np.unique(df[key]))}/{len(df)}')
# climate_interpretation
key = 'paleoData_sensorSpecies'
print('%s: '%key)
print(np.unique([kk for kk in df[key]]))
print(np.unique([str(type(dd)) for dd in df[key]]))
print(f'No. of unique values: {len(np.unique(df[key]))}/{len(df)}')

paleoData_sensorSpecies: 
['Diploastrea heliopora' 'Diploria labyrinthiformis' 'Favia speciosa'
 'Orbicella faveolata' 'Pavona clavus' 'Platygyra lamellina'
 'Porites australiensis' 'Porites lobata' 'Porites lutea' 'Porites solida'
 'Porites sp.' 'Pseudodiploria strigosa' 'Siderastrea radians'
 'Siderastrea siderea' 'Siderastrea sp.' 'Siderastrea stellata'
 'Solenastrea bournoni']
["<class 'str'>"]
No. of unique values: 17/221

paleoData_notes (notes)¶

In [35]:

Copied!





# # paleoData_notes
key = 'paleoData_notes'
print('%s: '%key)
print(df[key].values)
print(np.unique([str(type(dd)) for dd in df[key]]))
print(f'No. of unique values: {len(np.unique(df[key]))}/{len(df)}')
# # paleoData_notes
key = 'paleoData_notes'
print('%s: '%key)
print(df[key].values)
print(np.unique([str(type(dd)) for dd in df[key]]))
print(f'No. of unique values: {len(np.unique(df[key]))}/{len(df)}')

paleoData_notes: 
['This paper did not calibrate the d18O proxy or reconstruct temperature. It instead analyzed variability through time by directly using the d18O proxy.'
 'Sr/Ca-SST recconstructed with composite plus scale method to ERSSTv3b, no regression applied'
 'Sr/Ca-SST recconstructed with composite plus scale method to ERSSTv3b, no regression applied'
 'nan' 'nan'
 'Individual coral records that are part of the Rarotonga composite' 'nan'
 'nan' 'nan' 'monthly correlations with SST not reported'
 'Other calibration slopes are available in Zinke et al. 2004; 1920-1995 samples monthly; 1919-1658 sampled bimonthly'
 'nan' 'nan' 'nan' 'nan'
 '1953-1993 and 1961-1993 calibration periods, first with 0.13 slope, latter -0.17 slope'
 'nan' 'nan'
 'Sr/Ca-SST calibrations listed were found in Linsley et al. 2004. The calibration from Linsley et al. 2000 is as follows: slope = -0.082; intercept = 11.568; rsq = 0.75'
 'Sr/Ca-SST calibrations listed were found in Linsley et al. 2004. The calibration from Linsley et al. 2000 is as follows: slope = -0.082; intercept = 11.568; rsq = 0.75'
 'nan' 'nan' 'nan' 'nan' 'nan'
 'Core data is a composite of overlapping individual pieces and any replicate analyses. Data is seasonal min-max (Feb - Aug)'
 'nan' 'Fossil Coral' 'nan'
 'One core, this record: younger part of core MAF00-01 at monthly resolution (1896-1998);  annual correlation based on 1897-1997 data'
 'monthly correlations not reported, only assesed with IOD; used Cole et al. (2000) slope of -0.24'
 'nan'
 'Sr/Ca-SST calibrations not published because of weak relationship for both GS and NP cores'
 'Sr/Ca-SST calibrations not published because of weak relationship for both GS and NP cores'
 'nan'
 'One core, this record: older part of core MAF00-01 at bimonthly resolution (1622-1722)'
 'nan' 'nan' 'nan'
 'Totor Sr/Ca core top performed well in many sections, others less good, especially core top since 1988 to 2006. Cabri was much better. There is a table with likely best section in Totor in paper'
 'Multiple linear regression analyses using monthly (non-detrended) anomalies for coral and instrumental SST and SSS data for the period 1970 to 2008. For NGB core, regression equation is: d18O_anom = 0.15 (0.03)*SST + 0.36 (0.07)*SSS.n'
 'slope is not calculated, reported slope from Weber and Woodhead is used of -0.24'
 'nan' 'nan'
 'This record is a combination of high-resolution data from Murty et al. 2018 and annual data from Bryan et al. 2019. Studies are independent but were performed on the same coral core (CG). Calibration information, SST range, and analytical error provided are for high-resolution data from Murty et al. 2018.'
 'nan' 'nan' 'nan' 'nan' 'nan'
 'Same colony as 08PS-A2, but different core and core top age' 'nan' 'nan'
 'nan' 'Same colony as 08PS-A1, but different core and core top age'
 'used Gagan 1994 d18O slope' 'monthly correlations not reported' 'nan'
 'nan'
 'STM4 Sr/Ca core top was too cold vs STM2, would be careful. STM2 is really good, compares well to Rodrigures core Cabri'
 'Fossil Coral' 'nan'
 'Two cores were spliced at 1984 to avoid secondary aragonite'
 'Two cores were spliced at 1984 to avoid secondary aragonite' 'nan'
 'Multiple linear regression analyses using monthly (non-detrended) anomalies for coral and instrumental SST and SSS data for the period 1970 to 2008. For RI core, regression equation is: d18O_anom = 0.03 (0.03)*SST + 0.36 (0.05)*SSS.nn                                        n                                n                        n                n'
 'nan' 'nan' 'nan' 'nan' 'nan' 'nan' 'monthly correlations not reported'
 'Composite of two fossil cores. Data was sub-monthly and was linearly resampled to monthly.'
 'Composite of two fossil cores. Data was sub-monthly and was linearly resampled to monthly.'
 'Sr/Ca-SST recconstructed with composite plus scale method to ERSSTv3b, no regression applied'
 'Sr/Ca-SST recconstructed with composite plus scale method to ERSSTv3b, no regression applied'
 'd18O driven by rainfall; little SST correlation'
 'Publication notes that core F4 has higher variance than contemporary cores and instrumental SST products and that this is likely due to lagoonal mixing.'
 'coral was primariliy analysed for Boron and paper does not discuss much about Sr/Ca and d18O even though they were analysed'
 'coral was primariliy analysed for Boron and paper does not discuss much about Sr/Ca and d18O even though they were analysed'
 'nan' 'nan'
 'Sr/Ca regression slope error is estimated to be (+/-) 0.21 degrees C; Used the Zinke (2015) method - normalised and scaled to the s.d. SST box of the period 1961-1990; equation based on composite of 2 cores'
 'Sr/Ca regression slope error is estimated to be (+/-) 0.21 degrees C; Used the Zinke (2015) method - normalised and scaled to the s.d. SST box of the period 1961-1990; equation based on composite of 2 cores'
 'NP1 and NP2 cores spliced together to get full NP record due to bioerosion in the NP1 core. Sr/Ca-SST calibrations not published because of weak relationship for both GS and NP cores.'
 'nan' 'nan' 'nan'
 'Study focuses on Ba/Ca and Y/Ca. Sr/Ca is primarily used for chronology'
 'Multiple linear regression analyses using monthly (not detrended) anomalies for coral and instrumental SST and SSS data for the period 1970 to 2008. For UC core, regression equation is: d18O_anom = 0.09 (0.03)*SST + 0.33 (0.05)*SSS'
 'Fossil Coral'
 'Raw data was sub-monthly and was linearly resampled to monthly.'
 'Raw data was sub-monthly and was linearly resampled to monthly.'
 'This calibration data is taken from the top 40 years of the core from Hetzinger et al 2006'
 'This calibration data is taken from the top 40 years of the core from Hetzinger et al 2006'
 'This calibration data is taken from the top 40 years of the core from Hetzinger et al 2006'
 'This calibration data is taken from the top 40 years of the core from Hetzinger et al 2006'
 'Sr/Ca are average values of three colonies and replicate paths' 'nan'
 'nan'
 'This is a refinement of the record available previously (Druffel and Griffin, JGR 1993) which showed biennial d18O for the period 1635-1957.'
 'This is a refinement of the record available previously (Druffel and Griffin, JGR 1993) which showed biennial d18O for the period 1635-1957.'
 'Individual coral records that are part of the Rarotonga composite'
 'Mean SST ranges given in paper for northern reef 7.7C; for southern reefs 5.8C'
 'nan' 'nan'
 'Sr/Ca-SST recconstructed with composite plus scale method to ERSSTv3b, no regression applied'
 'Sr/Ca-SST recconstructed with composite plus scale method to ERSSTv3b, no regression applied'
 'nan' 'nan'
 'regressions in paper use air temps instead of SST, relevant growth information about coral found in PhD thesis (http://nbn-resolving.de/urn:nbn:de:gbv:46-ep000102521)'
 'A composite of cores TN (CoralHydro2k ID BO14HTI01) and BB was used from 2010-1977 for the published reconstructions in Bolton et al. 2014 and Goodkin et al. 2021.'
 'A composite of cores TN (CoralHydro2k ID BO14HTI01) and BB was used from 2010-1977 for the published reconstructions in Bolton et al. 2014 and Goodkin et al. 2021.'
 'Calibrations to SST data were performed on a shorter, higher-resolution set of samples. See Murty et al. 2017 for more information.'
 'The calibration equation incorporated both SST and salinity, so the d18O-SST slope is not included here to avoid misrepresentation.'
 'nan' 'nan'
 'Annual regression slopes (-0.213 / C, -0.140 mmol/mol / C) imply an apparent amplification of inferred SST variations on interannual and longer timescales.'
 'Annual regression slopes (-0.213 / C, -0.140 mmol/mol / C) imply an apparent amplification of inferred SST variations on interannual and longer timescales.'
 'Annual regression slopes (-0.213 / C, -0.140 mmol/mol / C) imply an apparent amplification of inferred SST variations on interannual and longer timescales.'
 'Annual regression slopes (-0.213 / C, -0.140 mmol/mol / C) imply an apparent amplification of inferred SST variations on interannual and longer timescales.'
 'Oxygen isotope data has NOT been corrected for the acid fractionation difference (acid-alpha) between standards (calcite) and coral samples (aragonite). Prior to 1895/1896 the data exhibits a kinetic overprint.'
 'nan' 'Study focuses on use of Sr/U rather than Sr/Ca' 'nan' 'nan'
 'Sr/Ca-SST recconstructed with composite plus scale method to ERSSTv3b, no regression applied'
 'Sr/Ca-SST recconstructed with composite plus scale method to ERSSTv3b, no regression applied'
 'nan' 'nan' 'focused more on Sr/U calibrations'
 'focused more on Sr/U calibrations' 'nan' 'nan'
 'regressions in paper use air temps instead of SST, relevant growth information about coral found in PhD thesis (http://nbn-resolving.de/urn:nbn:de:gbv:46-ep000102521)'
 'See Pfeiffer et al. 2004 for more monthly/bimonthly d18O calibrations and Pfeiffer et al. 2019 for annual d18O and Sr/Ca calibrations.'
 'See Pfeiffer et al. 2004 for more monthly/bimonthly d18O calibrations and Pfeiffer et al. 2019 for annual d18O and Sr/Ca calibrations.'
 'Study focuses on use of Sr/U rather than Sr/Ca' 'nan' 'nan'
 'Fossil Coral' 'regression information based on Kilbourne MS Thesis'
 'regression information based on Kilbourne MS Thesis'
 'Microatoll; coral rubble samples; data reported in Supplements of paper'
 'Study focuses on use of Sr/U rather than Sr/Ca; multiple locations Atlantic and Pacific; Sr/Ca uncertainty 1 deg C; Sr-U uncertainty 0.5 deg C; Sr/Ca-SST slope not indicated'
 'nan' 'nan'
 'Microatoll; coral rubble samples; data reported in Supplements of paper'
 'nan'
 'Study uses multiple cores from multiple locations in the Great Barrier Reef; spans 15-18S latitude; multiple Sr/Ca regression equations in the paper. Supplemnet has all data.'
 'd18O is a composite of cores 06SB-A1 and 07SB-A2'
 'd18O is a composite of cores 06SB-A1 and 07SB-A2' 'nan' 'nan' 'nan'
 'monthly correlations not reported' 'nan'
 'Extra information supplied that is not included in publication such as higher precision slope value. Noted as exposed to open ocean; seasonally influenced by river discharge'
 'uncertainty on Sr/Ca intercept is 0.0018'
 'uncertainty on Sr/Ca intercept is 0.0018'
 'Note: Length of coral record increased for Abram et al., 2020 publication relative to Abram et al., 2015'
 'nan' 'nan' 'nan' 'nan' 'nan' 'nan'
 'Sr/Ca-SST recconstructed with composite plus scale method to ERSSTv3b, no regression applied'
 'Sr/Ca-SST recconstructed with composite plus scale method to ERSSTv3b, no regression applied'
 'Mean SST ranges given in paper for northern reef 7.7C; for southern reefs 5.8C'
 'in situ d18o; for annual (sr/Ca, slope- -0.0583, intercept - 10.378)'
 'monthly correlations not reimported' 'nan'
 'Sr/Ca calibration equation originally found in Ramos et al. 2017; Monthly Sr/Ca data available from 1880-2012; Monthly and seasonal (DJFM vs JJAS, data input on Jan and July) d18O data available from 1894-2012 and 1880-1893, respectively.'
 'monthly correlations not reported, instead used IOD season' 'nan' 'nan'
 'nan'
 'Core was collected in a subhorizontal, not vertical, orientation from coral colony; indistinct growth banding in top 50 years of core'
 'Core was collected in a subhorizontal, not vertical, orientation from coral colony; indistinct growth banding in top 50 years of core'
 'nan'
 'Composite of cores C2B (13.1m depth), C4B (8.2m), C6A (11.3m), and CF1B (found on beach); reconstructed d18Osw data are anomalies'
 'Published slopes based on composite coral results. Used -0.20.02 permil per 1 deg C regressions'
 'Published slopes based on composite coral results. Used -0.20.02 permil per 1 deg C regressions'
 'authors describe density banding as poor; some fish grazing scars'
 'Sr/Ca-SST recconstructed with composite plus scale method to ERSST3b, no regression applied'
 'Sr/Ca-SST recconstructed with composite plus scale method to ERSST3b, no regression applied'
 'Annual regression slopes (-0.29 / C, -0.115 mmol/mol / C) imply an apparent amplification of inferred SST variations on interannual and longer timescales.'
 'Annual regression slopes (-0.29 / C, -0.115 mmol/mol / C) imply an apparent amplification of inferred SST variations on interannual and longer timescales.'
 'Annual regression slopes (-0.29 / C, -0.115 mmol/mol / C) imply an apparent amplification of inferred SST variations on interannual and longer timescales.'
 'Annual regression slopes (-0.29 / C, -0.115 mmol/mol / C) imply an apparent amplification of inferred SST variations on interannual and longer timescales.'
 'Microatoll with core taken from top and side and spliced together; As with all coral timeseries, exact months are not known, so annual averages sometimes represent more or less than 12 months.'
 'Microatoll with core taken from top and side and spliced together; As with all coral timeseries, exact months are not known, so annual averages sometimes represent more or less than 12 months.'
 'This is the bottom part of a core that included an unconformity. Base of the coral is u-series dated and age model is from bands counted up from there.'
 'This is the bottom part of a core that included an unconformity. Base of the coral is u-series dated and age model is from bands counted up from there.'
 'This is the bottom part of a core that included an unconformity. Base of the coral is u-series dated and age model is from bands counted up from there.'
 'This is the bottom part of a core that included an unconformity. Base of the coral is u-series dated and age model is from bands counted up from there.'
 'Published slopes based on composite coral results. Used -0.20.02 permil per 1 deg C regressions'
 'Published slopes based on composite coral results. Used -0.20.02 permil per 1 deg C regressions'
 'nan' 'nan' 'nan'
 'slope and y-intercept information only available for mean annual calibrations'
 'used published values (i.e., not locally derived) for Sr/Ca-SST and d18O-SST slopes'
 'used published values (i.e., not locally derived) for Sr/Ca-SST and d18O-SST slopes'
 'nan' 'nan' 'See Goodkin et al 2005 for interannual Sr/Ca calibration'
 'See Goodkin et al 2005 for interannual Sr/Ca calibration'
 'mm-scale drilling but available data is at annual resolution'
 'Sr/Ca regression slope error is estimated to be (+/-) 0.21 degrees C; Used the Zinke (2015) method - normalised and scaled to the s.d. SST box of the period 1961-1990; equation based on composite of 2 cores'
 'Sr/Ca regression slope error is estimated to be (+/-) 0.21 degrees C; Used the Zinke (2015) method - normalised and scaled to the s.d. SST box of the period 1961-1990; equation based on composite of 2 cores'
 'nan' 'nan'
 '7 calibration equations are available in Boiseau et al. 1998 (this database contains Equation (1) information)'
 'This paper did not calibrate the d18O proxy or reconstruct temperature. It instead analyzed variability through time by directly using the d18O proxy.'
 'nan' 'nan' 'nan'
 'Monthly Sr/Ca data available from 1788-2013; Monthly and seasonal (DJFM vs JJAS, data input on Jan and July) d18O data available from 1906-2013 and 1788-1905 respectively.']
["<class 'str'>"]
No. of unique values: 73/221

paleoData_variableName¶

In [36]:

Copied!





# paleoData_variableName
key = 'paleoData_variableName'
print('%s: '%key)
print(np.unique([kk for kk in df[key]]))
print(np.unique([str(type(dd)) for dd in df[key]]))
# paleoData_variableName
key = 'paleoData_variableName'
print('%s: '%key)
print(np.unique([kk for kk in df[key]]))
print(np.unique([str(type(dd)) for dd in df[key]]))

paleoData_variableName: 
['Sr/Ca' 'd18O']
["<class 'str'>"]

climate metadata: interpretation variable, direction, seasonality¶

interpretation_direction¶

In [37]:

Copied!





# climate_interpretation
key = 'interpretation_direction'
print('%s: '%key)
print(np.unique([kk for kk in df[key]]))
print(f'No. of unique values: {len(np.unique(df[key]))}/{len(df)}')
# climate_interpretation
key = 'interpretation_direction'
print('%s: '%key)
print(np.unique([kk for kk in df[key]]))
print(f'No. of unique values: {len(np.unique(df[key]))}/{len(df)}')

interpretation_direction: 
['N/A']
No. of unique values: 1/221

interpretation_seasonality¶

In [38]:

Copied!





# climate_interpretation
key = 'interpretation_seasonality'
print('%s: '%key)
print(np.unique([kk for kk in df[key]]))
print(f'No. of unique values: {len(np.unique(df[key]))}/{len(df)}')
# climate_interpretation
key = 'interpretation_seasonality'
print('%s: '%key)
print(np.unique([kk for kk in df[key]]))
print(f'No. of unique values: {len(np.unique(df[key]))}/{len(df)}')

interpretation_seasonality: 
['N/A']
No. of unique values: 1/221

interpretation_variable¶

In [39]:

Copied!





# climate_interpretation
key = 'interpretation_variable'
print('%s: '%key)
print(np.unique([kk for kk in df[key]]))
print(f'No. of unique values: {len(np.unique(df[key]))}/{len(df)}')
# climate_interpretation
key = 'interpretation_variable'
print('%s: '%key)
print(np.unique([kk for kk in df[key]]))
print(f'No. of unique values: {len(np.unique(df[key]))}/{len(df)}')

interpretation_variable: 
['temperature' 'temperature+moisture']
No. of unique values: 2/221

interpretation_variableDetail¶

In [40]:

Copied!





# climate_interpretation
key = 'interpretation_variableDetail'
print('%s: '%key)
print(np.unique([kk for kk in df[key]]))
print(f'No. of unique values: {len(np.unique(df[key]))}/{len(df)}')
# climate_interpretation
key = 'interpretation_variableDetail'
print('%s: '%key)
print(np.unique([kk for kk in df[key]]))
print(f'No. of unique values: {len(np.unique(df[key]))}/{len(df)}')

interpretation_variableDetail: 
['temperature - manually assigned by DoD2k authors for paleoData_proxy = Sr/Ca'
 'temperature+moisture - manually assigned by DoD2k authors for paleoData_proxy = d18O']
No. of unique values: 2/221

data¶

paleoData_values¶

In [41]:

Copied!





# # paleoData_values
key = 'paleoData_values'

print('%s: '%key)
for ii, vv in enumerate(df[key][:20]):
    try: 
        print('%-30s: %s -- %s'%(df['dataSetName'].iloc[ii][:30], str(np.nanmin(vv)), str(np.nanmax(vv))))
        print(type(vv))
    except: print(df['dataSetName'].iloc[ii], 'NaNs detected.')
print(np.unique([str(type(dd)) for dd in df[key]]))
# # paleoData_values
key = 'paleoData_values'

print('%s: '%key)
for ii, vv in enumerate(df[key][:20]):
    try: 
        print('%-30s: %s -- %s'%(df['dataSetName'].iloc[ii][:30], str(np.nanmin(vv)), str(np.nanmax(vv))))
        print(type(vv))
    except: print(df['dataSetName'].iloc[ii], 'NaNs detected.')
print(np.unique([str(type(dd)) for dd in df[key]]))

paleoData_values: 
CH03BUN01                     : -5.758 -- -4.6518
<class 'numpy.ndarray'>
ZI15MER01                     : 8.80159 -- 9.006902
<class 'numpy.ndarray'>
ZI15MER01                     : 8.80159 -- 9.006902
<class 'numpy.ndarray'>
CO03PAL03                     : -5.38 -- -4.11
<class 'numpy.ndarray'>
CO03PAL02                     : -5.295 -- -4.338
<class 'numpy.ndarray'>
LI06RAR01                     : -5.13 -- -3.82
<class 'numpy.ndarray'>
CO03PAL07                     : -5.51 -- -4.44
<class 'numpy.ndarray'>
FL18DTO03                     : 8.891 -- 9.476
<class 'numpy.ndarray'>
UR00MAI01                     : -5.304433 -- -3.752342
<class 'numpy.ndarray'>
TU95MAD01                     : -5.895 -- -4.578
<class 'numpy.ndarray'>
ZI04IFR01                     : -5.43 -- -3.41
<class 'numpy.ndarray'>
RE18CAY01                     : -4.812 -- -3.629
<class 'numpy.ndarray'>
RE18CAY01                     : 8.807 -- 9.1
<class 'numpy.ndarray'>
RE18CAY01                     : 8.863 -- 9.043
<class 'numpy.ndarray'>
RE18CAY01                     : -4.577 -- -3.915
<class 'numpy.ndarray'>
KU99HOU01                     : -4.7 -- -3.04
<class 'numpy.ndarray'>
OS13NLP01                     : -6.1125712 -- -5.112277
<class 'numpy.ndarray'>
EV98KIR01                     : -5.233 -- -3.748
<class 'numpy.ndarray'>
LI00RAR01                     : -4.9993 -- -3.5122
<class 'numpy.ndarray'>
LI00RAR01                     : 9.1651 -- 9.75
<class 'numpy.ndarray'>
["<class 'numpy.ndarray'>"]

paleoData_units¶

In [42]:

Copied!





# paleoData_units
key = 'paleoData_units'
print('%s: '%key)
print(np.unique([kk for kk in df[key]]))
print(np.unique([str(type(dd)) for dd in df[key]]))
print(f'No. of unique values: {len(np.unique(df[key]))}/{len(df)}')
# paleoData_units
key = 'paleoData_units'
print('%s: '%key)
print(np.unique([kk for kk in df[key]]))
print(np.unique([str(type(dd)) for dd in df[key]]))
print(f'No. of unique values: {len(np.unique(df[key]))}/{len(df)}')

paleoData_units: 
['mmol/mol' 'permil']
["<class 'str'>"]
No. of unique values: 2/221

year¶

In [43]:

Copied!





# # year
key = 'year'
print('%s: '%key)
for ii, vv in enumerate(df[key][:20]):
    try: print('%-30s: %s -- %s'%(df['dataSetName'].iloc[ii][:30], str(np.nanmin(vv)), str(np.nanmax(vv))))
    except: print('NaNs detected.', vv)
print(np.unique([str(type(dd)) for dd in df[key]]))
# # year
key = 'year'
print('%s: '%key)
for ii, vv in enumerate(df[key][:20]):
    try: print('%-30s: %s -- %s'%(df['dataSetName'].iloc[ii][:30], str(np.nanmin(vv)), str(np.nanmax(vv))))
    except: print('NaNs detected.', vv)
print(np.unique([str(type(dd)) for dd in df[key]]))

year: 
CH03BUN01                     : 1860.0 -- 1990.58
ZI15MER01                     : 1891.0 -- 2009.0
ZI15MER01                     : 1891.0 -- 2009.0
CO03PAL03                     : 1317.17 -- 1406.49
CO03PAL02                     : 1149.08 -- 1220.205
LI06RAR01                     : 1906.88 -- 1999.75
CO03PAL07                     : 1635.02 -- 1666.48
FL18DTO03                     : 1997.646 -- 2012.208
UR00MAI01                     : 1840.0 -- 1994.5
TU95MAD01                     : 1922.542 -- 1991.292
ZI04IFR01                     : 1659.625 -- 1995.625
RE18CAY01                     : 1887.04 -- 2012.54
RE18CAY01                     : 1887.04 -- 2012.54
RE18CAY01                     : 1887.0 -- 2011.0
RE18CAY01                     : 1887.0 -- 2011.0
KU99HOU01                     : 1794.71 -- 1994.38
OS13NLP01                     : 1990.17 -- 2008.17
EV98KIR01                     : 1938.292 -- 1993.625
LI00RAR01                     : 1726.753 -- 1996.8641
LI00RAR01                     : 1726.753 -- 1996.8641
["<class 'numpy.ndarray'>"]

yearUnits¶

In [44]:

Copied!





# yearUnits
key = 'yearUnits'
print('%s: '%key)
print(np.unique([kk for kk in df[key]]))
print(np.unique([str(type(dd)) for dd in df[key]]))
print(f'No. of unique values: {len(np.unique(df[key]))}/{len(df)}')
# yearUnits
key = 'yearUnits'
print('%s: '%key)
print(np.unique([kk for kk in df[key]]))
print(np.unique([str(type(dd)) for dd in df[key]]))
print(f'No. of unique values: {len(np.unique(df[key]))}/{len(df)}')

yearUnits: 
['CE']
["<class 'str'>"]
No. of unique values: 1/221

In [ ]: