The Basics of TOAD

Viel Spaß! 🎉

# Prerequisites
import matplotlib.pyplot as plt
import xarray as xr

# import TOAD
from toad import TOAD
import toad

plt.rcParams["figure.dpi"] = 300
plt.rcParams["figure.figsize"] = (12, 6)

Test data

Quick look at PISM simulation data of ice-sheet melt.

xr.load_dataset("test_data/garbe_2020_antarctica.nc").thk.sel(
    GMST=[0, 2, 4, 6, 8, 10], method="nearest"
).plot(col="GMST", cmap="RdBu_r")
<xarray.plot.facetgrid.FacetGrid at 0x3186e95b0>
../_images/fb42c4394d579fd8ca510135e56cc36ec0dea84acbf3d3f7fc8becc23ea76ce1.png

Init the TOAD object an .nc file

TOAD works on spatio-temporal data. If you time dimension is not called “time”, you need to tell TOAD, as done below:

td = TOAD("test_data/garbe_2020_antarctica.nc", time_dim="GMST")
td.data = td.data.drop_vars(td.shift_vars)

Step 1: Compute shifts

from toad.shifts import ASDETECT

# define variable and method
td.compute_shifts(var="thk", method=ASDETECT(timescale=(0.5, 3.5)))

# ~12 seconds
INFO: New shifts variable thk_dts: min/mean/max=-1.000/-0.227/1.000 using 13901 grid cells. Skipped 61.5% grid cells: 0 NaN, 22199 constant.

Quick illustration of ASDETECT output

fig, axs = plt.subplots(2, sharex=True)
idx, idy = 100, 90
td.data.thk.isel(x=idx, y=idy).plot(ax=axs[0])
td.data.thk_dts.isel(x=idx, y=idy).plot(ax=axs[1])
axs[0].set_ylabel("Variable (ice-thickness)")
axs[1].set_ylabel("Detection Time Series")
axs[1].set_ylim(-1.25, 1.25)
axs[0].set_title("")
axs[1].set_title("")
axs[0].set_xlabel("")
fig.tight_layout()
../_images/553307baf6479f4293f72f35a406873b4a77c090e1377472a751cfa7d726be54.png

Step 2: Compute Clusters

  • Now we need a clustering method: TOAD can take any clustering method from the scikit-learn library,

  • We recommend using HDBSCAN which has intuitive hyperparameters and supports clusters of different densities. Tip: For large datasets fast_hdbscan can be significantly faster.

  • You can also write your own clustering algorithm, see clustering_methods.ipynb for details.

from sklearn.cluster import HDBSCAN

td.data = td.data.drop_vars(td.cluster_vars)

td.compute_clusters(
    var="thk",  # reference your base variable, TOAD will find computed shifts for this variable
    method=HDBSCAN(min_cluster_size=15),
    time_weight=2.0,
)
INFO: New cluster variable thk_dts_cluster: Identified 79 clusters in 16,636 pts; Left 30.0% as noise (4,987 pts).

Calling the TOAD object gives you an overview of current computed variables:

For the “base variable” thk, we now have “shifts variable” thk_dts and “cluster variable” thk_dts_cluster.

td

TOAD Object

Variable Hierarchy:

base var thk (1 shifts, 1 clusterings)

Hint: to access the xr.dataset call td.data

<xarray.Dataset> Size: 152MB
Dimensions:          (GMST: 350, y: 190, x: 190)
Coordinates:
  * GMST             (GMST) float64 3kB 0.0301 0.0701 0.1101 ... 13.95 13.99
  * y                (y) float64 2kB -3.032e+06 -3e+06 ... 2.984e+06 3.016e+06
  * x                (x) float64 2kB -3.032e+06 -3e+06 ... 2.984e+06 3.016e+06
Data variables:
    thk              (GMST, y, x) float32 51MB 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0
    thk_dts          (GMST, y, x) float32 51MB nan nan nan nan ... nan nan nan
    thk_dts_cluster  (GMST, y, x) int32 51MB -1 -1 -1 -1 -1 ... -1 -1 -1 -1 -1
Attributes: (12/13)
    CDI:            Climate Data Interface version 1.9.6 (http://mpimet.mpg.d...
    proj4:          +lon_0=0.0 +ellps=WGS84 +datum=WGS84 +lat_ts=-71.0 +proj=...
    CDO:            Climate Data Operators version 1.9.6 (http://mpimet.mpg.d...
    source:         PISM (development v1.0-535-gb3de48787 committed by Julius...
    institution:    PIK / Potsdam Institute for Climate Impact Research
    author:         Julius Garbe (julius.garbe@pik-potsdam.de)
    ...             ...
    title:          garbe_2020_antarctica
    Conventions:    CF-1.9
    projection:     Polar Stereographic South (71S,0E)
    ice_density:    910. kg m-3
    NCO:            netCDF Operators version 4.7.8 (Homepage = http://nco.sf....
    Modifications:  Modified by Jakob Harteg (jakob.harteg@pik-potsdam.de) Se...
# You can find all method parameters used for the computation in the attributes, e.g:
td.data.thk_dts.attrs
{'standard_name': 'land_ice_thickness',
 'long_name': 'land ice thickness',
 'units': 'm',
 'pism_intent': 'model_state',
 'time_dim': 'GMST',
 'method_name': 'ASDETECT',
 'toad_version': '0.3',
 'base_variable': 'thk',
 'variable_type': 'shift',
 'method_ignore_nan_warnings': 'False',
 'method_lmax': '87',
 'method_lmin': '12',
 'method_segmentation': 'two_sided',
 'method_timescale': '(0.5, 3.5)'}

Let’s plot the result:

TOAD supplies three plotting functions:

  • td.plot.overview()

  • td.plot.cluster_map()

  • td.plot.timeseries()

td.plot.overview();
../_images/6a3279422bada6ab60bb66c35cdf07bc65d58623e2bfc421a65666f682698a92.png
td.plot.overview(mode="aggregated");
../_images/d845bda51f68f415b0d0d476ed9e9e09b7f849328f93820e4402699a10da9275.png
td.plot.timeseries(cluster_ids=[1, 5]);
../_images/3ad5835cfd83110d7972494632c88385ca6ba373ebf8aed6a88c21530714bd89.png

Customize the map with the map_style param.

fig, ax = td.plot.cluster_map(
    cluster_ids=[1, 5],
    include_all_clusters=False,
    map_style=toad.MapStyle(projection="south_pole", grid_lines=False),
)
ax.set_axis_off()
../_images/2ac17978fb69f5d7069e8aa8c97ea30709f1a2a65e134123f13d1406eeb78387.png

Save your results

Save the dataset with the new additions using the td.save() function. This automatically adds compression.

# Use original path name with suffix
# td.save(suffix="_my_computed_toad_data")

# Use a different path name
# td.save(path="my_computed_toad_data.nc")

Export cluster masks

mask = td.get_cluster_mask_spatial("thk", cluster_id=1)
mask
<xarray.DataArray 'thk_dts_cluster' (y: 190, x: 190)> Size: 36kB
array([[False, False, False, ..., False, False, False],
       [False, False, False, ..., False, False, False],
       [False, False, False, ..., False, False, False],
       ...,
       [False, False, False, ..., False, False, False],
       [False, False, False, ..., False, False, False],
       [False, False, False, ..., False, False, False]], shape=(190, 190))
Coordinates:
  * y        (y) float64 2kB -3.032e+06 -3e+06 ... 2.984e+06 3.016e+06
  * x        (x) float64 2kB -3.032e+06 -3e+06 ... 2.984e+06 3.016e+06
mask.plot()  # type: ignore
<matplotlib.collections.QuadMesh at 0x17c2d7860>
../_images/75636a7961fd7a6a5ca506d0844683dcbb8a7653c4b191700355b70695d63ea6.png

Access Timeseries in Clusters

# Get time series from clusters
timeseries = td.get_cluster_timeseries("thk", cluster_id=2)

# get mean of timeseries
mean = td.get_cluster_timeseries("thk", cluster_id=2, aggregation="mean")

# Eg: to plot:
plt.figure(figsize=(10, 2))
for ts in timeseries:
    ts.plot(color="k", alpha=0.12, lw=1)
mean.plot(lw=3, color="r")  # type: ignore
plt.title("Cluster 2")
Text(0.5, 1.0, 'Cluster 2')
../_images/142f8d511489059c024d409d5d5c69dab92bbaa958b21cd96af24cd80155c715.png

Stats

Calling td.stats(var) and the submodules time and space will expose more statistics related helper functions.

Time stats

# get start time for cluster 5
td.stats(var="thk").time.start(cluster_id=5)
2.1501
# get all stats in dictionary
td.stats("thk").time.all_stats(5)
{'duration': 0.3600000000000003,
 'duration_timesteps': 9,
 'end': 2.5101000000000004,
 'end_timestep': 62,
 'iqr_50': (2.2701000000000002, 2.3501000000000003),
 'iqr_68': (2.2301, 2.3501000000000003),
 'iqr_90': (2.1901, 2.3501000000000003),
 'mean': 2.314544444444445,
 'median': 2.3101000000000003,
 'membership_peak': 2.2701000000000002,
 'membership_peak_density': 0.005318559556786704,
 'start': 2.1501,
 'start_timestep': 53,
 'std': 0.11066577420434438,
 'steepest_gradient': 2.3101000000000003,
 'steepest_gradient_timestep': 57}

Space stats

# Get mean x,y position of cluster 5
td.stats(var="thk").space.mean(cluster_id=5)
(-282661.12266112264, -1269139.293139293)
td.stats("thk").space.all_stats(5)
{'central_point_for_labeling': (-248000.0, -1336000.0),
 'footprint_cumulative_area': 481,
 'footprint_mean': (-282661.12266112264, -1269139.293139293),
 'footprint_median': (-280000.0, -1272000.0),
 'footprint_std': (229008.10144179303, 214851.6466720021),
 'mean': (-282661.12266112264, -1269139.293139293),
 'median': (-280000.0, -1272000.0),
 'std': (229008.101441793, 214851.64667200207)}

Cluster scoring

We have a few scores available already that aim to quantify the quality of a cluster.

score_heaviside() evaluates how closely the spatially aggregated cluster time series resembles a perfect Heaviside function. A score of 1 indicates a perfect step function, while 0 indicates a linear trend.

score_consistency() measures internal coherence using hierarchical clustering. Higher scores indicate more internally consistent clusters, which is useful for assessing cluster quality.

score_spatial_autocorrelation() measures average pairwise similarity (R²) between time series within a cluster. Higher scores mean more similar behavior within the cluster. This provides a fast computation of spatial coherence.

score_nonlinearity() measures deviation from linearity using RMSE and can be normalized against unclustered data. Higher scores indicate more nonlinear behavior, which is good for detecting complex temporal patterns.

td.stats("thk").general.score_heaviside(5, aggregation="mean")
0.41722873312015085

Example 2

Let’s just try some different data: Mean summer near-surface air temperature.

td2 = TOAD("test_data/global_mean_summer_tas.nc")
td2.data = td2.data.drop_vars(td2.shift_vars)

# Compute shifts
td2.compute_shifts(
    "tas",
    method=ASDETECT(),
)
INFO: New shifts variable tas_dts: min/mean/max=-0.875/0.014/0.979 using 139968 grid cells. Skipped 0.0% grid cells: 0 NaN, 0 constant.
# Compute clusters
td2.compute_clusters(
    "tas",
    method=HDBSCAN(min_cluster_size=20),
)
INFO: New cluster variable tas_dts_cluster: Identified 18 clusters in 13,291 pts; Left 2.5% as noise (329 pts).
# Plot clusters
td2.plot.cluster_map("tas", cluster_ids=range(10), map_style={"projection": "global"});

# Tip: map style can also be set as a dict, instead of importing the MapStyle class
../_images/612b153807cce8f3f62e6a155a1c5029690a1099e443676eaa23c5a770f5b072.png