Load FE23¶
load TRW data from FE23 (https://doi.org/10.25921/8hpf-a451)
Dataset downloaded from NCEI: https://www.ncei.noaa.gov/access/paleo-search/study/36773
Created 25/10/2024 by Lucie Luecke (LL)
Updated 24/10/2025 by LL: tidied up and streamlined for documentation and publication Updated 21/11/2024 by LL: added csv saving of compact dataframe, removed redundant output.
Here we extract a dataframe with the following columns:
archiveTypedataSetNamedatasetIdgeo_meanElevgeo_meanLatgeo_meanLongeo_siteNameinterpretation_direction(new in v2.0)interpretation_variableinterpretation_variableDetailinterpretation_seasonality(new in v2.0)originalDataURLoriginalDatabasepaleoData_notespaleoData_proxypaleoData_sensorSpeciespaleoData_unitspaleoData_valuespaleoData_variableNameyearyearUnits
We save a standardised compact dataframe for concatenation to DoD2k
Set up working environment¶
Make sure the repo_root is added correctly, it should be: your_root_dir/dod2k This should be the working directory throughout this notebook (and all other notebooks).
%load_ext autoreload
%autoreload 2
import sys
import os
from pathlib import Path
# Add parent directory to path (works from any notebook in notebooks/)
# the repo_root should be the parent directory of the notebooks folder
init_dir = Path().resolve()
print(init_dir)
# Determine repo root
if init_dir.name == 'dod2k': repo_root = init_dir
elif init_dir.parent.name == 'dod2k': repo_root = init_dir.parent
elif init_dir.parent.parent.name == 'dod2k': repo_root = init_dir.parent.parent
else: raise Exception('Please review the repo root structure (see first cell).')
# Update cwd and path only if needed
if os.getcwd() != str(repo_root):
os.chdir(repo_root)
if str(repo_root) not in sys.path:
sys.path.insert(0, str(repo_root))
print(f"Repo root: {repo_root}")
if str(os.getcwd())==str(repo_root):
print(f"Working directory matches repo root. ")
/home/jupyter-lluecke/dod2k_v2.0/dod2k/notebooks Repo root: /home/jupyter-lluecke/dod2k_v2.0/dod2k Working directory matches repo root.
import xarray as xr
import pandas as pd
import numpy as np
from dod2k_utilities import ut_functions as utf # contains utility functions
from dod2k_utilities import ut_plot as uplt # contains plotting functions
load the source data¶
Specify the data and metadata which we are looking to extract from FE23 for the standardised 'compact dataframe':
vars = ['chronos', 'lonlat', 'investigator', 'trwsSm', 'chronology', 'country', 'species',
'elevation', 'sitename', 'treetime']
In order to get the source data, run the cell below (Warning: this is downloading a very large netCDF), which downloads the full dataset from NCEI (25GB) and extracts a slice based on the relevant metadata (~60MB).
Alternatively skip the cell and directly use the slice as provided in this directory (see cell below next).
# # download and unzip FE23
# !wget -O /data/fe23/franke2022-fe23.nc https://www.ncei.noaa.gov/pub/data/paleo/contributions_by_author/franke2022/franke2022-fe23.nc
# fe23_full = xr.open_dataset('fe23/franke2022-fe23.nc')
# # save slice of FE23 with only relevant variables as netCDF (fe23_full is 25GB)
# fe23_slice = fe23_full[vars]
# fe23_slice.to_netcdf('data/fe23/franke2022-fe23_slice.nc')
fe23_slice = xr.open_dataset('data/fe23/franke2022-fe23_slice.nc')
print(fe23_slice)
<xarray.Dataset> Size: 58MB
Dimensions: (ttime: 1159, nseries: 278, nregion: 22, lonlat: 2,
nchars_cinv: 42, nchars_chr: 32, nchars_ctry: 22,
nchars_csp: 6, nchars_cn: 51)
Coordinates:
lonlat (nseries, nregion, lonlat) float64 98kB ...
Dimensions without coordinates: ttime, nseries, nregion, nchars_cinv,
nchars_chr, nchars_ctry, nchars_csp, nchars_cn
Data variables:
chronos (ttime, nseries, nregion) float64 57MB ...
investigator (nchars_cinv, nseries, nregion) |S1 257kB ...
trwsSm (nseries, nregion) float64 49kB ...
chronology (nchars_chr, nseries, nregion) |S1 196kB ...
country (nchars_ctry, nseries, nregion) |S1 135kB ...
species (nchars_csp, nseries, nregion) |S1 37kB ...
elevation (nseries, nregion) float64 49kB ...
sitename (nchars_cn, nseries, nregion) |S1 312kB ...
treetime (ttime) float64 9kB ...
Attributes:
reference: Franke, J; Evans, MN; Schurer, AP; Hegerl, GC, 2022, Clim...
doi: https://doi.org/10.25921/8hpf-a451
creation_time: 27-Oct-2024 11:45:29
df_fe23 = {}
for var in vars:
print(var)
df_fe23[var] = []
for ii in fe23_slice.nregion: # loop through the regions
fe23_slice[var] = np.squeeze(fe23_slice[var])
# print(fe23_full[var].shape)
for jj in fe23_slice.nseries: # loop through the records in any one region
if var in ['chronos']: data = fe23_slice[var][:, jj, ii].data
elif var in ['trwsSm', 'elevation']: data = float(fe23_slice[var][jj, ii].data)
elif var in ['lonlat', 'trwsSm']: data = fe23_slice[var][jj, ii, :].data
elif var in ['investigator', 'chronology', 'country', 'species', 'sitename']:
data = b''.join([ss for ss in fe23_slice[var][:, jj, ii].data]).decode("latin-1").replace(' ','')
if ~np.all(np.isnan(fe23_slice['chronos'][:, jj, ii].data)):
df_fe23[var].append(data)
# len(all_trees)
chronos lonlat investigator trwsSm chronology country species elevation sitename treetime
create compact dataframe¶
Create empty dataframe and populate with the data from the netCDF
df_compact = pd.DataFrame(columns=['archiveType', 'interpretation_variable', 'dataSetName', 'datasetId',
'geo_meanElev', 'geo_meanLat', 'geo_meanLon', 'geo_siteName',
'originalDatabase', 'originalDataURL', 'paleoData_notes', 'paleoData_proxy',
'paleoData_units', 'paleoData_values', 'year', 'yearUnits'])
df_compact['paleoData_values'] = df_fe23['chronos']
df_compact['year'] = [fe23_slice.treetime.data for ii in range(len(df_compact))]
The netCDF has a homogeneous time coordinate, but may have missing values. We use only not-nan data:
for ii in df_compact.index:
dd=utf.convert_to_nparray(df_compact.at[ii, 'paleoData_values'])
df_compact.at[ii, 'paleoData_values']=dd.data[~dd.mask]
df_compact.at[ii, 'year']=np.array(df_compact.at[ii, 'year'])[~dd.mask]
df_compact[['geo_meanLon', 'geo_meanLat']] = df_fe23['lonlat']
df_compact['geo_meanElev'] = df_fe23['elevation']
df_compact['datasetId'] = df_fe23['chronology']
df_compact['datasetId'] = df_compact['datasetId'].apply(lambda x: x.replace('.rwl',''))
df_compact['dataSetName'] = df_compact['datasetId']
df_compact['datasetId'] = df_compact['datasetId'].apply(lambda x: 'FE23_'+x)
Keep populating the metadata columns from the netCDF metadata.
The original data URL can be reconstructed from NCEI using the dataSetName.
url = 'https://www.ncei.noaa.gov/pub/data/paleo/treering/measurements/'
df_compact['geo_siteName'] = df_fe23['sitename']
df_compact['paleoData_sensorSpecies'] = df_fe23['species']
df_compact['paleoData_notes'] = df_fe23['investigator']
df_compact['paleoData_notes'] = df_compact['paleoData_notes'].apply(lambda x: 'Investigator: '+x)
df_compact['originalDataURL'] = df_compact['dataSetName'].apply(lambda x: url+x.replace('_','/')+'-noaa.rwl')
df_compact['archiveType'] = 'Wood' # fills called 'paleoData_variableName'
df_compact['paleoData_proxy'] = 'ring width' # fills column called 'paleoData_variableName'
df_compact['paleoData_units'] = 'standardized_anomalies' # fills column called 'paleoData_units'
df_compact['originalDatabase'] = 'FE23 (Breitenmoser et al. (2014))' # fills column 'originalDatabase'
df_compact['yearUnits'] = 'CE' # fills column 'yearUnits'
df_compact['paleoData_variableName'] = 'ring width' # fills column 'yearUnits'
The climate interpretation variable in the netCDF is given as an integer (1: temperature sensitive, 2: moisture sensitive, 3: temperature and moisture sensitive, 4: not temperature and not moisture sensitive)
TM = {1.:'temperature', 2.:'moisture', 3.:'temperature+moisture', 4.: 'NOT temperature NOT moisture', 0:'nan'}
df_compact['interpretation_variable'] = df_fe23['trwsSm']
df_compact['interpretation_variable'] = df_compact['interpretation_variable'].apply(lambda x: TM[x] if ~np.isnan(x) else 'N/A')
df_compact['interpretation_variableDetail'] = 'N/A'
df_compact['interpretation_seasonality'] = 'N/A'
df_compact['interpretation_direction'] = 'N/A'
Drop rows with no data, all zero rows, all nan rows, all constant rows
drop_inds = []
for ii in range(df_compact.shape[0]):
if len(df_compact.iloc[ii]['year'])==0:
print('empty', ii, df_compact.iloc[ii]['year'], df_compact.iloc[ii]['originalDatabase'])
print(df_compact.iloc[ii]['paleoData_values'])
drop_inds += [df_compact.index[ii]]
for ii, row in enumerate(df_compact.paleoData_values):
if np.std(row)==0:
print(ii, 'std=0')
elif np.sum(np.diff(row)**2)==0:
print(ii, 'diff=0')
elif np.isnan(np.std(row)):
print(ii, 'std nan')
else:
continue
if df.index[ii] not in drop_inds:
drop_inds += [df_compact.index[ii]]
print(drop_inds)
df_compact = df_compact.drop(index=drop_inds)
[]
Check that the datasetId is unique and that each record has an ID
# check that the datasetId is unique
assert len(df_compact.datasetId.unique())==len(df_compact)
save compact dataframe¶
save pickle¶
# save to a pickle file (security: is it better to save to csv?)
df_compact = df_compact[sorted(df_compact.columns)]
df_compact.to_pickle('data/fe23/fe23_compact.pkl')
save csv¶
# save to a list of csv files (metadata, data, year)
df_compact.name='fe23'
utf.write_compact_dataframe_to_csv(df_compact)
METADATA: datasetId, archiveType, dataSetName, geo_meanElev, geo_meanLat, geo_meanLon, geo_siteName, interpretation_direction, interpretation_seasonality, interpretation_variable, interpretation_variableDetail, originalDataURL, originalDatabase, paleoData_notes, paleoData_proxy, paleoData_sensorSpecies, paleoData_units, paleoData_variableName, yearUnits Saved to /home/jupyter-lluecke/dod2k_v2.0/dod2k/data/fe23/fe23_compact_%s.csv
# load dataframe to check that it loads correctly
df = utf.load_compact_dataframe_from_csv('fe23')
<class 'pandas.core.frame.DataFrame'> RangeIndex: 2754 entries, 0 to 2753 Data columns (total 21 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 archiveType 2754 non-null object 1 dataSetName 2754 non-null object 2 datasetId 2754 non-null object 3 geo_meanElev 2710 non-null float32 4 geo_meanLat 2754 non-null float32 5 geo_meanLon 2754 non-null float32 6 geo_siteName 2754 non-null object 7 interpretation_direction 2754 non-null object 8 interpretation_seasonality 2754 non-null object 9 interpretation_variable 2754 non-null object 10 interpretation_variableDetail 2754 non-null object 11 originalDataURL 2754 non-null object 12 originalDatabase 2754 non-null object 13 paleoData_notes 2754 non-null object 14 paleoData_proxy 2754 non-null object 15 paleoData_sensorSpecies 2754 non-null object 16 paleoData_units 2754 non-null object 17 paleoData_values 2754 non-null object 18 paleoData_variableName 2754 non-null object 19 year 2754 non-null object 20 yearUnits 2754 non-null object dtypes: float32(3), object(18) memory usage: 419.7+ KB None
Visualise dataframe¶
Show spatial distribution of records, show archive and proxy types
# count archive types
archive_count = {}
for ii, at in enumerate(set(df['archiveType'])):
archive_count[at] = df.loc[df['archiveType']==at, 'archiveType'].count()
sort = np.argsort([cc for cc in archive_count.values()])
archives_sorted = np.array([cc for cc in archive_count.keys()])[sort][::-1]
# Specify colour for each archive (smaller archives get grouped into the same colour)
archive_colour, major_archives, other_archives = uplt.get_archive_colours(archives_sorted, archive_count)
fig = uplt.plot_geo_archive_proxy(df, archive_colour)
utf.save_fig(fig, f'geo_{df.name}', dir=df.name)
0 Wood 2754 saved figure in /home/jupyter-lluecke/dod2k_v2.0/dod2k/figs/fe23/geo_fe23.pdf
Now plot the coverage over the Common Era
fig = uplt.plot_coverage(df, archives_sorted, major_archives, other_archives, archive_colour)
utf.save_fig(fig, f'time_{df.name}', dir=df.name)
saved figure in /home/jupyter-lluecke/dod2k_v2.0/dod2k/figs/fe23/time_fe23.pdf
Display dataframe¶
Display identification metadata: dataSetName, datasetId, originalDataURL, originalDatabase¶
index¶
# # check index
print(df.index)
RangeIndex(start=0, stop=2754, step=1)
dataSetName (associated with each record, may not be unique)¶
# # check dataSetName
key = 'dataSetName'
print('%s: '%key)
print(df[key].values)
print(np.unique([str(type(dd)) for dd in df[key]]))
dataSetName: ['africa_keny001' 'africa_keny002' 'africa_morc001' ... 'northamerica_usa_wy034' 'northamerica_usa_wy035' 'northamerica_usa_wy036'] ["<class 'str'>"]
datasetId (unique identifier, as given by original authors, includes original database token)¶
# # check datasetId
print(len(df.datasetId.unique()))
print(len(df))
key = 'datasetId'
print('%s (starts with): '%key)
print(df[key].values)
print(np.unique([str(type(dd)) for dd in df[key]]))
print('datasetId starts with: ', np.unique([str(dd.split('_')[0]) for dd in df[key]]))
2754 2754 datasetId (starts with): ['FE23_africa_keny001' 'FE23_africa_keny002' 'FE23_africa_morc001' ... 'FE23_northamerica_usa_wy034' 'FE23_northamerica_usa_wy035' 'FE23_northamerica_usa_wy036'] ["<class 'str'>"] datasetId starts with: ['FE23']
originalDataURL (URL/DOI of original published record where available)¶
# originalDataURL
key = 'originalDataURL'
print('%s: '%key)
print(np.unique([kk for kk in df[key]]))
print(np.unique([kk for kk in df[key] if 'this' in kk]))
print(np.unique([str(type(dd)) for dd in df[key]]))
# 'this study' should point to the correct URL (PAGES2k)
originalDataURL: ['https://www.ncei.noaa.gov/pub/data/paleo/treering/measurements/africa/keny001-noaa.rwl' 'https://www.ncei.noaa.gov/pub/data/paleo/treering/measurements/africa/keny002-noaa.rwl' 'https://www.ncei.noaa.gov/pub/data/paleo/treering/measurements/africa/morc001-noaa.rwl' ... 'https://www.ncei.noaa.gov/pub/data/paleo/treering/measurements/southamerica/chil016-noaa.rwl' 'https://www.ncei.noaa.gov/pub/data/paleo/treering/measurements/southamerica/chil017-noaa.rwl' 'https://www.ncei.noaa.gov/pub/data/paleo/treering/measurements/southamerica/chil018-noaa.rwl'] [] ["<class 'str'>"]
originalDatabase (original database used as input for dataframe)¶
# # originalDataSet
key = 'originalDatabase'
print('%s: '%key)
print(np.unique([kk for kk in df[key]]))
print(np.unique([str(type(dd)) for dd in df[key]]))
# Note: the last two records have missing URLs
originalDatabase: ['FE23 (Breitenmoser et al. (2014))'] ["<class 'str'>"]
geographical metadata: elevation, latitude, longitude, site name¶
geo_meanElev (mean elevation in m)¶
# check Elevation
key = 'geo_meanElev'
print('%s: '%key)
print(df[key])
print(np.unique(['%d'%kk for kk in df[key] if np.isfinite(kk)]))
print(np.unique([str(type(dd)) for dd in df[key]]))
geo_meanElev:
0 2010.0
1 2010.0
2 2200.0
3 1700.0
4 2200.0
...
2749 2500.0
2750 2542.0
2751 1319.0
2752 2400.0
2753 2378.0
Name: geo_meanElev, Length: 2754, dtype: float32
['0' '1' '10' '100' '1000' '1002' '1005' '1006' '101' '1010' '1020' '1030'
'1036' '1040' '1047' '105' '1050' '1051' '1052' '1055' '1060' '1065'
'1067' '107' '1070' '1071' '1075' '108' '1080' '1085' '109' '1090' '1095'
'1097' '110' '1100' '111' '1110' '1120' '1128' '1130' '1132' '1140'
'1146' '115' '1150' '1155' '1156' '1158' '116' '1160' '1167' '1169'
'1170' '1175' '1180' '1194' '12' '120' '1200' '1201' '1206' '1208' '1219'
'1220' '1224' '1225' '1230' '1231' '1234' '1235' '1237' '1240' '1250'
'1253' '126' '1260' '1270' '1275' '1280' '1285' '13' '130' '1300' '1302'
'131' '1310' '1311' '1315' '1317' '1319' '1320' '1325' '1330' '1340'
'135' '1350' '1354' '136' '1360' '1366' '1367' '1370' '1372' '1375'
'1377' '138' '1380' '1385' '1390' '1391' '1392' '1395' '14' '140' '1400'
'1402' '1405' '1410' '1415' '1417' '1418' '1420' '1425' '143' '1432'
'1433' '1436' '1440' '1448' '145' '1450' '1460' '1463' '1464' '1465'
'1468' '1469' '1470' '1474' '1475' '1480' '149' '1490' '1493' '1494'
'1495' '15' '150' '1500' '1510' '152' '1520' '1524' '1525' '153' '1530'
'1531' '1540' '1545' '1550' '1555' '1560' '1565' '1570' '1580' '1585'
'1586' '1595' '1596' '1598' '16' '160' '1600' '1601' '1620' '1625' '1630'
'1633' '164' '1640' '1644' '1645' '165' '1650' '1656' '1658' '1660'
'1670' '1675' '1676' '168' '1680' '1682' '1690' '1694' '17' '170' '1700'
'1701' '1706' '1707' '1710' '1720' '1722' '1723' '1725' '1731' '1735'
'1737' '1740' '175' '1750' '1755' '1760' '1767' '1768' '1770' '1772'
'1775' '1780' '1785' '1790' '1793' '1798' '18' '180' '1800' '1803' '1804'
'1811' '1817' '182' '1820' '1825' '1828' '1829' '183' '1830' '1840'
'1841' '1848' '185' '1850' '1852' '1853' '1859' '1860' '1862' '1870'
'1875' '188' '1889' '189' '1890' '19' '190' '1900' '1905' '191' '1910'
'192' '1920' '1921' '1922' '1925' '1938' '194' '1940' '1942' '1945' '195'
'1950' '1951' '1958' '1960' '1965' '1966' '1969' '197' '1970' '1975'
'1980' '1981' '199' '1996' '2' '20' '200' '2000' '2002' '2004' '201'
'2010' '2011' '2012' '2013' '2020' '2024' '2027' '2030' '2042' '205'
'2050' '2057' '2060' '2065' '207' '2070' '2072' '2073' '2075' '208'
'2080' '2084' '2085' '209' '2090' '2097' '2098' '210' '2100' '2103' '211'
'2115' '2118' '2121' '213' '2130' '2133' '2134' '214' '2140' '2142' '215'
'2150' '2160' '2164' '2165' '217' '2170' '2179' '218' '2180' '2185'
'2187' '2194' '2195' '2196' '220' '2200' '2210' '2215' '2225' '2229'
'223' '2242' '225' '2250' '2255' '2256' '2265' '2268' '2270' '2271'
'2272' '228' '2280' '2284' '2286' '2289' '229' '2290' '230' '2300' '2301'
'2310' '2316' '232' '2320' '2323' '233' '2332' '2333' '2346' '2347' '235'
'2350' '2362' '2370' '2375' '2377' '2378' '2380' '2385' '239' '2392'
'2393' '2394' '24' '240' '2400' '2407' '2408' '2417' '2420' '2423' '243'
'2438' '244' '2441' '245' '246' '2460' '2465' '2469' '2475' '2484' '2498'
'2499' '25' '250' '2500' '251' '2514' '2515' '2530' '2535' '2542' '2550'
'2560' '257' '258' '2580' '259' '2590' '2591' '2592' '26' '260' '2600'
'2605' '2615' '262' '2621' '2626' '2630' '2636' '2637' '2641' '2645'
'2650' '2651' '2652' '2658' '267' '2670' '2682' '2688' '2690' '2696'
'2697' '27' '270' '2700' '2713' '2727' '2730' '2731' '274' '2740' '2741'
'2743' '2745' '2746' '275' '2750' '2755' '2760' '2774' '2790' '280'
'2800' '2804' '2805' '2816' '282' '2820' '2828' '2835' '285' '2850'
'2865' '2877' '2880' '2890' '2894' '2895' '2896' '290' '2900' '291'
'2925' '2926' '2930' '2940' '295' '2950' '2956' '2960' '297' '2970'
'2987' '2990' '3' '30' '300' '3000' '3017' '3020' '3025' '3033' '3048'
'305' '3050' '3065' '307' '308' '3095' '310' '3100' '3110' '3113' '3115'
'3120' '3125' '314' '3140' '315' '3150' '3154' '317' '3170' '3190' '320'
'3200' '3208' '321' '3218' '3220' '3221' '3230' '3235' '325' '3250'
'3261' '3276' '329' '3290' '3291' '330' '3300' '3320' '3330' '335' '3352'
'3353' '3370' '3378' '339' '340' '3400' '3413' '3415' '342' '3420' '3425'
'345' '3450' '3470' '3475' '3480' '35' '350' '3500' '3505' '3519' '3535'
'3536' '354' '355' '3570' '360' '3600' '362' '3630' '366' '3660' '3688'
'370' '3700' '3719' '3720' '3740' '375' '376' '378' '38' '380' '3800'
'381' '384' '385' '387' '390' '392' '395' '396' '40' '400' '401' '402'
'405' '408' '410' '411' '413' '420' '421' '424' '425' '426' '427' '43'
'430' '438' '44' '440' '442' '443' '445' '45' '450' '455' '457' '459'
'46' '460' '465' '468' '469' '47' '470' '475' '480' '482' '490' '493'
'494' '5' '50' '500' '501' '503' '510' '512' '518' '520' '523' '525' '53'
'530' '535' '540' '55' '550' '555' '558' '56' '560' '564' '570' '575'
'576' '579' '580' '582' '590' '597' '6' '60' '600' '607' '61' '610' '611'
'612' '620' '625' '63' '630' '631' '64' '640' '645' '646' '65' '650'
'658' '660' '67' '670' '672' '675' '68' '680' '690' '7' '70' '700' '701'
'705' '710' '715' '716' '720' '725' '730' '731' '738' '74' '740' '745'
'747' '75' '750' '755' '76' '762' '765' '77' '770' '775' '78' '780' '785'
'790' '792' '798' '8' '80' '800' '803' '805' '808' '810' '820' '822'
'823' '825' '830' '838' '840' '85' '850' '853' '854' '860' '867' '87'
'870' '872' '875' '880' '884' '89' '890' '9' '90' '900' '910' '914' '915'
'918' '920' '923' '925' '929' '930' '940' '945' '95' '950' '952' '960'
'967' '970' '975' '976' '980' '988' '99' '990' '991' '994' '995']
["<class 'float'>"]
geo_meanLat (mean latitude in degrees N)¶
# # Latitude
key = 'geo_meanLat'
print('%s: '%key)
print(np.unique(['%d'%kk for kk in df[key]]))
print(np.unique([str(type(dd)) for dd in df[key]]))
geo_meanLat: ['-18' '-22' '-23' '-24' '-25' '-26' '-27' '-31' '-32' '-33' '-34' '-35' '-36' '-37' '-38' '-39' '-40' '-41' '-42' '-43' '-44' '-45' '-46' '-50' '-53' '-54' '-7' '0' '16' '17' '19' '20' '21' '23' '24' '25' '26' '27' '28' '29' '30' '31' '32' '33' '34' '35' '36' '37' '38' '39' '40' '41' '42' '43' '44' '45' '46' '47' '48' '49' '50' '51' '52' '53' '54' '55' '56' '57' '58' '59' '60' '61' '62' '63' '64' '65' '66' '67' '68' '69' '70' '71' '72'] ["<class 'float'>"]
geo_meanLon (mean longitude)¶
# # Longitude
key = 'geo_meanLon'
print('%s: '%key)
print(np.unique(['%d'%kk for kk in df[key]]))
print(np.unique([str(type(dd)) for dd in df[key]]))
geo_meanLon: ['-1' '-100' '-101' '-102' '-103' '-104' '-105' '-106' '-107' '-108' '-109' '-110' '-111' '-112' '-113' '-114' '-115' '-116' '-117' '-118' '-119' '-120' '-121' '-122' '-123' '-124' '-125' '-126' '-127' '-128' '-129' '-130' '-133' '-134' '-135' '-136' '-137' '-138' '-139' '-140' '-141' '-142' '-143' '-144' '-145' '-146' '-147' '-148' '-149' '-150' '-151' '-152' '-153' '-154' '-159' '-162' '-163' '-2' '-3' '-4' '-5' '-58' '-6' '-61' '-62' '-63' '-64' '-65' '-66' '-67' '-68' '-69' '-7' '-70' '-71' '-72' '-73' '-74' '-75' '-76' '-77' '-78' '-79' '-8' '-80' '-81' '-82' '-83' '-84' '-85' '-86' '-87' '-88' '-89' '-9' '-90' '-91' '-92' '-93' '-94' '-95' '-96' '-97' '-98' '-99' '0' '1' '10' '100' '101' '103' '104' '105' '106' '107' '109' '11' '110' '111' '112' '114' '115' '117' '118' '119' '12' '122' '125' '127' '128' '129' '13' '130' '132' '133' '136' '137' '138' '14' '141' '142' '143' '145' '146' '147' '148' '149' '15' '150' '151' '153' '154' '155' '158' '159' '16' '160' '163' '165' '167' '168' '169' '17' '170' '171' '172' '173' '174' '175' '176' '177' '18' '19' '2' '20' '21' '22' '23' '24' '25' '26' '27' '28' '29' '30' '31' '32' '33' '34' '35' '36' '37' '4' '41' '42' '43' '44' '45' '5' '50' '51' '53' '56' '57' '58' '59' '6' '60' '64' '65' '69' '7' '71' '72' '74' '75' '76' '77' '78' '79' '8' '80' '81' '82' '83' '84' '85' '86' '87' '88' '89' '9' '90' '91' '93' '94' '95' '97' '98' '99'] ["<class 'float'>"]
geo_siteName (name of collection site)¶
# Site Name
key = 'geo_siteName'
print('%s: '%key)
print(df[key].values)
print(np.unique([str(type(dd)) for dd in df[key]]))
geo_siteName: ['RagatiForestStationNyeriDistrict' 'RagatiForestStationNyeriDistrict' 'Tounfite' ... 'DevilsTowerNationalMonument' 'CookingHillside' 'KretecVale'] ["<class 'str'>"]
proxy metadata: archive type, proxy type, interpretation¶
archiveType (archive type)¶
# archiveType
key = 'archiveType'
print('%s: '%key)
print(np.unique(df[key]))
print(np.unique([str(type(dd)) for dd in df[key]]))
archiveType: ['Wood'] ["<class 'str'>"]
paleoData_proxy (proxy type)¶
# paleoData_proxy
key = 'paleoData_proxy'
print('%s: '%key)
print(np.unique([kk for kk in df[key]]))
print(np.unique([str(type(dd)) for dd in df[key]]))
paleoData_proxy: ['ring width'] ["<class 'str'>"]
paleoData_sensorSpecies (further information on proxy type: species)¶
# climate_interpretation
key = 'paleoData_sensorSpecies'
print('%s: '%key)
print(np.unique([kk for kk in df[key]]))
print(np.unique([str(type(dd)) for dd in df[key]]))
paleoData_sensorSpecies: ['ABAL' 'ABAM' 'ABBA' 'ABBO' 'ABCE' 'ABCI' 'ABCO' 'ABLA' 'ABMA' 'ABPI' 'ABPN' 'ABPR' 'ABSB' 'ABSP' 'ACRU' 'ACSH' 'ADHO' 'ADUS' 'AGAU' 'ARAR' 'ATCU' 'ATSE' 'AUCH' 'BEPU' 'CABU' 'CADE' 'CADN' 'CARO' 'CDAT' 'CDBR' 'CDDE' 'CDLI' 'CEAN' 'CESP' 'CHLA' 'CHNO' 'DABI' 'DACO' 'FAGR' 'FASY' 'FICU' 'FRNI' 'HABI' 'JGAU' 'JUEX' 'JUFO' 'JUOC' 'JUPH' 'JUPR' 'JURE' 'JUSC' 'JUSP' 'JUVI' 'LADE' 'LAGM' 'LALA' 'LALY' 'LAOC' 'LASI' 'LGFR' 'LIBI' 'LITU' 'NOBE' 'NOGU' 'NOME' 'NOPU' 'NOSO' 'PCAB' 'PCEN' 'PCGL' 'PCGN' 'PCMA' 'PCOB' 'PCOM' 'PCPU' 'PCRU' 'PCSH' 'PCSI' 'PCSM' 'PCSP' 'PHAL' 'PHAS' 'PHGL' 'PHTR' 'PIAL' 'PIAM' 'PIAR' 'PIBA' 'PIBN' 'PIBR' 'PICE' 'PICL' 'PICO' 'PIEC' 'PIED' 'PIFL' 'PIHA' 'PIHR' 'PIJE' 'PIKO' 'PILA' 'PILE' 'PILO' 'PIMO' 'PIMU' 'PIMZ' 'PINI' 'PIPA' 'PIPE' 'PIPI' 'PIPN' 'PIPO' 'PIPU' 'PIRE' 'PIRI' 'PIRO' 'PISF' 'PISI' 'PISP' 'PIST' 'PISY' 'PITA' 'PITO' 'PIUN' 'PIVI' 'PIWA' 'PLRA' 'PLUV' 'PPDE' 'PPSP' 'PRMA' 'PSMA' 'PSME' 'PTAN' 'QUAL' 'QUDG' 'QUFR' 'QUHA' 'QUKE' 'QULO' 'QULY' 'QUMA' 'QUMC' 'QUPE' 'QUPR' 'QURO' 'QURU' 'QUSP' 'QUST' 'QUVE' 'TABA' 'TADI' 'TAMU' 'TEGR' 'THOC' 'THPL' 'TSCA' 'TSCR' 'TSDU' 'TSHE' 'TSME' 'ULSP' 'VIKE' 'WICE'] ["<class 'str'>"]
paleoData_notes (notes)¶
# # paleoData_notes
key = 'paleoData_notes'
print('%s: '%key)
print(df[key].values)
print(np.unique([str(type(dd)) for dd in df[key]]))
paleoData_notes: ['Investigator: Stahle' 'Investigator: Stahle' 'Investigator: Stockton' ... 'Investigator: Stambaugh' 'Investigator: King' 'Investigator: King'] ["<class 'str'>"]
paleoData_variableName¶
# paleoData_variableName
key = 'paleoData_variableName'
print('%s: '%key)
print(np.unique([kk for kk in df[key]]))
print(np.unique([str(type(dd)) for dd in df[key]]))
paleoData_variableName: ['ring width'] ["<class 'str'>"]
climate metadata: interpretation variable, direction, seasonality¶
interpretation_direction¶
# climate_interpretation
key = 'interpretation_direction'
print('%s: '%key)
print(np.unique([kk for kk in df[key]]))
print(f'No. of unique values: {len(np.unique(df[key]))}/{len(df)}')
interpretation_direction: ['N/A'] No. of unique values: 1/2754
interpretation_seasonality¶
# climate_interpretation
key = 'interpretation_seasonality'
print('%s: '%key)
print(np.unique([kk for kk in df[key]]))
print(f'No. of unique values: {len(np.unique(df[key]))}/{len(df)}')
interpretation_seasonality: ['N/A'] No. of unique values: 1/2754
interpretation_variable¶
# climate_interpretation
key = 'interpretation_variable'
print('%s: '%key)
print(np.unique([kk for kk in df[key]]))
print(f'No. of unique values: {len(np.unique(df[key]))}/{len(df)}')
interpretation_variable: ['N/A' 'NOT temperature NOT moisture' 'moisture' 'temperature' 'temperature+moisture'] No. of unique values: 5/2754
interpretation_variableDetail¶
# climate_interpretation
key = 'interpretation_variableDetail'
print('%s: '%key)
print(np.unique([kk for kk in df[key]]))
print(f'No. of unique values: {len(np.unique(df[key]))}/{len(df)}')
interpretation_variableDetail: ['N/A'] No. of unique values: 1/2754
data¶
paleoData_values¶
# # paleoData_values
key = 'paleoData_values'
print('%s: '%key)
for ii, vv in enumerate(df[key][:20]):
try:
print('%-30s: %s -- %s'%(df['dataSetName'].iloc[ii][:30], str(np.nanmin(vv)), str(np.nanmax(vv))))
print(type(vv))
except: print(df['dataSetName'].iloc[ii], 'NaNs detected.')
print(np.unique([str(type(dd)) for dd in df[key]]))
paleoData_values: africa_keny001 : 0.4 -- 1.423 <class 'numpy.ndarray'> africa_keny002 : 0.499 -- 1.631 <class 'numpy.ndarray'> africa_morc001 : -0.014 -- 2.226 <class 'numpy.ndarray'> africa_morc002 : 0.323 -- 1.587 <class 'numpy.ndarray'> africa_morc003 : 0.004 -- 1.617 <class 'numpy.ndarray'> africa_morc011 : 0.005 -- 2.094 <class 'numpy.ndarray'> africa_morc012 : 0.435 -- 1.866 <class 'numpy.ndarray'> africa_morc013 : 0.166 -- 1.389 <class 'numpy.ndarray'> africa_morc014 : -0.025 -- 2.012 <class 'numpy.ndarray'> africa_safr001 : 0.485 -- 2.129 <class 'numpy.ndarray'> africa_zimb001 : 0.15 -- 2.415 <class 'numpy.ndarray'> africa_zimb002 : 0.178 -- 2.044 <class 'numpy.ndarray'> africa_zimb003 : 0.24 -- 2.701 <class 'numpy.ndarray'> southamerica_arge : 0.161 -- 1.867 <class 'numpy.ndarray'> southamerica_arge001 : 0.336 -- 2.362 <class 'numpy.ndarray'> southamerica_arge002 : 0.478 -- 1.815 <class 'numpy.ndarray'> southamerica_arge004 : 0.508 -- 1.714 <class 'numpy.ndarray'> southamerica_arge005 : 0.313 -- 1.563 <class 'numpy.ndarray'> southamerica_arge006 : 0.203 -- 1.791 <class 'numpy.ndarray'> southamerica_arge007 : 0.368 -- 1.652 <class 'numpy.ndarray'> ["<class 'numpy.ndarray'>"]
paleoData_units¶
# paleoData_units
key = 'paleoData_units'
print('%s: '%key)
print(np.unique([kk for kk in df[key]]))
print(np.unique([str(type(dd)) for dd in df[key]]))
paleoData_units: ['standardized_anomalies'] ["<class 'str'>"]
year¶
# # year
key = 'year'
print('%s: '%key)
for ii, vv in enumerate(df[key][:20]):
try: print('%-30s: %s -- %s'%(df['dataSetName'].iloc[ii][:30], str(np.nanmin(vv)), str(np.nanmax(vv))))
except: print('NaNs detected.', vv)
print(np.unique([str(type(dd)) for dd in df[key]]))
year: africa_keny001 : 1944.0 -- 1993.0 africa_keny002 : 1950.0 -- 1994.0 africa_morc001 : 1360.0 -- 1983.0 africa_morc002 : 1686.0 -- 1984.0 africa_morc003 : 1755.0 -- 1984.0 africa_morc011 : 1598.0 -- 1984.0 africa_morc012 : 1813.0 -- 1984.0 africa_morc013 : 1854.0 -- 1984.0 africa_morc014 : 1200.0 -- 1984.0 africa_safr001 : 1665.0 -- 1976.0 africa_zimb001 : 1925.0 -- 1994.0 africa_zimb002 : 1877.0 -- 1997.0 africa_zimb003 : 1880.0 -- 1996.0 southamerica_arge : 1900.0 -- 1974.0 southamerica_arge001 : 1605.0 -- 1974.0 southamerica_arge002 : 1800.0 -- 1974.0 southamerica_arge004 : 1532.0 -- 1974.0 southamerica_arge005 : 1641.0 -- 1974.0 southamerica_arge006 : 1449.0 -- 1974.0 southamerica_arge007 : 1579.0 -- 1974.0 ["<class 'numpy.ndarray'>"]
yearUnits¶
# yearUnits
key = 'yearUnits'
print('%s: '%key)
print(np.unique([kk for kk in df[key]]))
print(np.unique([str(type(dd)) for dd in df[key]]))
yearUnits: ['CE'] ["<class 'str'>"]