The Basics of TOAD¶

Viel Spaß! 🎉

# Prerequisites
import matplotlib.pyplot as plt
import xarray as xr

# import TOAD
from toad import TOAD
import toad

plt.rcParams["figure.dpi"] = 300
plt.rcParams["figure.figsize"] = (12, 6)

Test data¶

Quick look at PISM simulation data of ice-sheet melt.

xr.load_dataset("test_data/garbe_2020_antarctica.nc").thk.sel(
    GMST=[0, 2, 4, 6, 8, 10], method="nearest"
).plot(col="GMST", cmap="RdBu_r")

<xarray.plot.facetgrid.FacetGrid at 0x3186e95b0>

../_images/fb42c4394d579fd8ca510135e56cc36ec0dea84acbf3d3f7fc8becc23ea76ce1.png

Init the TOAD object an .nc file¶

TOAD works on spatio-temporal data. If you time dimension is not called “time”, you need to tell TOAD, as done below:

td = TOAD("test_data/garbe_2020_antarctica.nc", time_dim="GMST")
td.data = td.data.drop_vars(td.shift_vars)

Step 1: Compute shifts¶

First we need an abrupt shifts detection method: TOAD currently provides an implemention of ASDETECT (Boulton & Lenton, 2019).
See the asdetect.py docstring for a brief description.
You can also implement your own shifts detection algorithm, see shift_detection_methods.ipynb for details.

from toad.shifts import ASDETECT

# define variable and method
td.compute_shifts(var="thk", method=ASDETECT(timescale=(0.5, 3.5)))

# ~12 seconds

INFO: New shifts variable thk_dts: min/mean/max=-1.000/-0.227/1.000 using 13901 grid cells. Skipped 61.5% grid cells: 0 NaN, 22199 constant.

Quick illustration of ASDETECT output¶

fig, axs = plt.subplots(2, sharex=True)
idx, idy = 100, 90
td.data.thk.isel(x=idx, y=idy).plot(ax=axs[0])
td.data.thk_dts.isel(x=idx, y=idy).plot(ax=axs[1])
axs[0].set_ylabel("Variable (ice-thickness)")
axs[1].set_ylabel("Detection Time Series")
axs[1].set_ylim(-1.25, 1.25)
axs[0].set_title("")
axs[1].set_title("")
axs[0].set_xlabel("")
fig.tight_layout()

../_images/553307baf6479f4293f72f35a406873b4a77c090e1377472a751cfa7d726be54.png

Step 2: Compute Clusters¶

Now we need a clustering method: TOAD can take any clustering method from the scikit-learn library,
We recommend using HDBSCAN which has intuitive hyperparameters and supports clusters of different densities. Tip: For large datasets fast_hdbscan can be significantly faster.
You can also write your own clustering algorithm, see clustering_methods.ipynb for details.

from sklearn.cluster import HDBSCAN

td.data = td.data.drop_vars(td.cluster_vars)

td.compute_clusters(
    var="thk",  # reference your base variable, TOAD will find computed shifts for this variable
    method=HDBSCAN(min_cluster_size=15),
    time_weight=2.0,
)

INFO: New cluster variable thk_dts_cluster: Identified 79 clusters in 16,636 pts; Left 30.0% as noise (4,987 pts).

Calling the TOAD object gives you an overview of current computed variables:

For the “base variable” thk, we now have “shifts variable” thk_dts and “cluster variable” thk_dts_cluster.

td

# You can find all method parameters used for the computation in the attributes, e.g:
td.data.thk_dts.attrs

{'standard_name': 'land_ice_thickness',
 'long_name': 'land ice thickness',
 'units': 'm',
 'pism_intent': 'model_state',
 'time_dim': 'GMST',
 'method_name': 'ASDETECT',
 'toad_version': '0.3',
 'base_variable': 'thk',
 'variable_type': 'shift',
 'method_ignore_nan_warnings': 'False',
 'method_lmax': '87',
 'method_lmin': '12',
 'method_segmentation': 'two_sided',
 'method_timescale': '(0.5, 3.5)'}

Let’s plot the result:¶

TOAD supplies three plotting functions:

td.plot.overview()
td.plot.cluster_map()
td.plot.timeseries()

td.plot.overview();

../_images/6a3279422bada6ab60bb66c35cdf07bc65d58623e2bfc421a65666f682698a92.png

td.plot.overview(mode="aggregated");

../_images/d845bda51f68f415b0d0d476ed9e9e09b7f849328f93820e4402699a10da9275.png

td.plot.timeseries(cluster_ids=[1, 5]);

../_images/3ad5835cfd83110d7972494632c88385ca6ba373ebf8aed6a88c21530714bd89.png

Customize the map with the map_style param.

fig, ax = td.plot.cluster_map(
    cluster_ids=[1, 5],
    include_all_clusters=False,
    map_style=toad.MapStyle(projection="south_pole", grid_lines=False),
)
ax.set_axis_off()

../_images/2ac17978fb69f5d7069e8aa8c97ea30709f1a2a65e134123f13d1406eeb78387.png

Save your results¶

Save the dataset with the new additions using the td.save() function. This automatically adds compression.

# Use original path name with suffix
# td.save(suffix="_my_computed_toad_data")

# Use a different path name
# td.save(path="my_computed_toad_data.nc")

Export cluster masks¶

mask = td.get_cluster_mask_spatial("thk", cluster_id=1)
mask

<xarray.DataArray 'thk_dts_cluster' (y: 190, x: 190)> Size: 36kB
array([[False, False, False, ..., False, False, False],
       [False, False, False, ..., False, False, False],
       [False, False, False, ..., False, False, False],
       ...,
       [False, False, False, ..., False, False, False],
       [False, False, False, ..., False, False, False],
       [False, False, False, ..., False, False, False]], shape=(190, 190))
Coordinates:
  * y        (y) float64 2kB -3.032e+06 -3e+06 ... 2.984e+06 3.016e+06
  * x        (x) float64 2kB -3.032e+06 -3e+06 ... 2.984e+06 3.016e+06

xarray.DataArray

'thk_dts_cluster'

y: 190
x: 190

False False False False False False ... False False False False False

array([[False, False, False, ..., False, False, False],
       [False, False, False, ..., False, False, False],
       [False, False, False, ..., False, False, False],
       ...,
       [False, False, False, ..., False, False, False],
       [False, False, False, ..., False, False, False],
       [False, False, False, ..., False, False, False]], shape=(190, 190))

Coordinates: (2)

(y)

float64

-3.032e+06 -3e+06 ... 3.016e+06

long_name :: Y-coordinate in Cartesian system
units :: m
axis :: Y
standard_name :: projection_y_coordinate
spacing_meters :: 16000.0

array([-3032000., -3000000., -2968000., -2936000., -2904000., -2872000.,
       -2840000., -2808000., -2776000., -2744000., -2712000., -2680000.,
       -2648000., -2616000., -2584000., -2552000., -2520000., -2488000.,
       -2456000., -2424000., -2392000., -2360000., -2328000., -2296000.,
       -2264000., -2232000., -2200000., -2168000., -2136000., -2104000.,
       -2072000., -2040000., -2008000., -1976000., -1944000., -1912000.,
       -1880000., -1848000., -1816000., -1784000., -1752000., -1720000.,
       -1688000., -1656000., -1624000., -1592000., -1560000., -1528000.,
       -1496000., -1464000., -1432000., -1400000., -1368000., -1336000.,
       -1304000., -1272000., -1240000., -1208000., -1176000., -1144000.,
       -1112000., -1080000., -1048000., -1016000.,  -984000.,  -952000.,
        -920000.,  -888000.,  -856000.,  -824000.,  -792000.,  -760000.,
        -728000.,  -696000.,  -664000.,  -632000.,  -600000.,  -568000.,
        -536000.,  -504000.,  -472000.,  -440000.,  -408000.,  -376000.,
        -344000.,  -312000.,  -280000.,  -248000.,  -216000.,  -184000.,
        -152000.,  -120000.,   -88000.,   -56000.,   -24000.,     8000.,
          40000.,    72000.,   104000.,   136000.,   168000.,   200000.,
         232000.,   264000.,   296000.,   328000.,   360000.,   392000.,
         424000.,   456000.,   488000.,   520000.,   552000.,   584000.,
         616000.,   648000.,   680000.,   712000.,   744000.,   776000.,
         808000.,   840000.,   872000.,   904000.,   936000.,   968000.,
        1000000.,  1032000.,  1064000.,  1096000.,  1128000.,  1160000.,
        1192000.,  1224000.,  1256000.,  1288000.,  1320000.,  1352000.,
        1384000.,  1416000.,  1448000.,  1480000.,  1512000.,  1544000.,
        1576000.,  1608000.,  1640000.,  1672000.,  1704000.,  1736000.,
        1768000.,  1800000.,  1832000.,  1864000.,  1896000.,  1928000.,
        1960000.,  1992000.,  2024000.,  2056000.,  2088000.,  2120000.,
        2152000.,  2184000.,  2216000.,  2248000.,  2280000.,  2312000.,
        2344000.,  2376000.,  2408000.,  2440000.,  2472000.,  2504000.,
        2536000.,  2568000.,  2600000.,  2632000.,  2664000.,  2696000.,
        2728000.,  2760000.,  2792000.,  2824000.,  2856000.,  2888000.,
        2920000.,  2952000.,  2984000.,  3016000.])

(x)

float64

-3.032e+06 -3e+06 ... 3.016e+06

long_name :: X-coordinate in Cartesian system
units :: m
axis :: X
standard_name :: projection_x_coordinate
spacing_meters :: 16000.0

array([-3032000., -3000000., -2968000., -2936000., -2904000., -2872000.,
       -2840000., -2808000., -2776000., -2744000., -2712000., -2680000.,
       -2648000., -2616000., -2584000., -2552000., -2520000., -2488000.,
       -2456000., -2424000., -2392000., -2360000., -2328000., -2296000.,
       -2264000., -2232000., -2200000., -2168000., -2136000., -2104000.,
       -2072000., -2040000., -2008000., -1976000., -1944000., -1912000.,
       -1880000., -1848000., -1816000., -1784000., -1752000., -1720000.,
       -1688000., -1656000., -1624000., -1592000., -1560000., -1528000.,
       -1496000., -1464000., -1432000., -1400000., -1368000., -1336000.,
       -1304000., -1272000., -1240000., -1208000., -1176000., -1144000.,
       -1112000., -1080000., -1048000., -1016000.,  -984000.,  -952000.,
        -920000.,  -888000.,  -856000.,  -824000.,  -792000.,  -760000.,
        -728000.,  -696000.,  -664000.,  -632000.,  -600000.,  -568000.,
        -536000.,  -504000.,  -472000.,  -440000.,  -408000.,  -376000.,
        -344000.,  -312000.,  -280000.,  -248000.,  -216000.,  -184000.,
        -152000.,  -120000.,   -88000.,   -56000.,   -24000.,     8000.,
          40000.,    72000.,   104000.,   136000.,   168000.,   200000.,
         232000.,   264000.,   296000.,   328000.,   360000.,   392000.,
         424000.,   456000.,   488000.,   520000.,   552000.,   584000.,
         616000.,   648000.,   680000.,   712000.,   744000.,   776000.,
         808000.,   840000.,   872000.,   904000.,   936000.,   968000.,
        1000000.,  1032000.,  1064000.,  1096000.,  1128000.,  1160000.,
        1192000.,  1224000.,  1256000.,  1288000.,  1320000.,  1352000.,
        1384000.,  1416000.,  1448000.,  1480000.,  1512000.,  1544000.,
        1576000.,  1608000.,  1640000.,  1672000.,  1704000.,  1736000.,
        1768000.,  1800000.,  1832000.,  1864000.,  1896000.,  1928000.,
        1960000.,  1992000.,  2024000.,  2056000.,  2088000.,  2120000.,
        2152000.,  2184000.,  2216000.,  2248000.,  2280000.,  2312000.,
        2344000.,  2376000.,  2408000.,  2440000.,  2472000.,  2504000.,
        2536000.,  2568000.,  2600000.,  2632000.,  2664000.,  2696000.,
        2728000.,  2760000.,  2792000.,  2824000.,  2856000.,  2888000.,
        2920000.,  2952000.,  2984000.,  3016000.])

mask.plot()  # type: ignore

<matplotlib.collections.QuadMesh at 0x17c2d7860>

../_images/75636a7961fd7a6a5ca506d0844683dcbb8a7653c4b191700355b70695d63ea6.png

Access Timeseries in Clusters¶

# Get time series from clusters
timeseries = td.get_cluster_timeseries("thk", cluster_id=2)

# get mean of timeseries
mean = td.get_cluster_timeseries("thk", cluster_id=2, aggregation="mean")

# Eg: to plot:
plt.figure(figsize=(10, 2))
for ts in timeseries:
    ts.plot(color="k", alpha=0.12, lw=1)
mean.plot(lw=3, color="r")  # type: ignore
plt.title("Cluster 2")

Text(0.5, 1.0, 'Cluster 2')

../_images/142f8d511489059c024d409d5d5c69dab92bbaa958b21cd96af24cd80155c715.png

Stats¶

Calling td.stats(var) and the submodules time and space will expose more statistics related helper functions.

Time stats¶

# get start time for cluster 5
td.stats(var="thk").time.start(cluster_id=5)

2.1501

# get all stats in dictionary
td.stats("thk").time.all_stats(5)

{'duration': 0.3600000000000003,
 'duration_timesteps': 9,
 'end': 2.5101000000000004,
 'end_timestep': 62,
 'iqr_50': (2.2701000000000002, 2.3501000000000003),
 'iqr_68': (2.2301, 2.3501000000000003),
 'iqr_90': (2.1901, 2.3501000000000003),
 'mean': 2.314544444444445,
 'median': 2.3101000000000003,
 'membership_peak': 2.2701000000000002,
 'membership_peak_density': 0.005318559556786704,
 'start': 2.1501,
 'start_timestep': 53,
 'std': 0.11066577420434438,
 'steepest_gradient': 2.3101000000000003,
 'steepest_gradient_timestep': 57}

Space stats¶

# Get mean x,y position of cluster 5
td.stats(var="thk").space.mean(cluster_id=5)

(-282661.12266112264, -1269139.293139293)

td.stats("thk").space.all_stats(5)

{'central_point_for_labeling': (-248000.0, -1336000.0),
 'footprint_cumulative_area': 481,
 'footprint_mean': (-282661.12266112264, -1269139.293139293),
 'footprint_median': (-280000.0, -1272000.0),
 'footprint_std': (229008.10144179303, 214851.6466720021),
 'mean': (-282661.12266112264, -1269139.293139293),
 'median': (-280000.0, -1272000.0),
 'std': (229008.101441793, 214851.64667200207)}

Cluster scoring¶

We have a few scores available already that aim to quantify the quality of a cluster.

score_heaviside() evaluates how closely the spatially aggregated cluster time series resembles a perfect Heaviside function. A score of 1 indicates a perfect step function, while 0 indicates a linear trend.

score_consistency() measures internal coherence using hierarchical clustering. Higher scores indicate more internally consistent clusters, which is useful for assessing cluster quality.

score_spatial_autocorrelation() measures average pairwise similarity (R²) between time series within a cluster. Higher scores mean more similar behavior within the cluster. This provides a fast computation of spatial coherence.

score_nonlinearity() measures deviation from linearity using RMSE and can be normalized against unclustered data. Higher scores indicate more nonlinear behavior, which is good for detecting complex temporal patterns.

td.stats("thk").general.score_heaviside(5, aggregation="mean")

0.41722873312015085

Example 2¶

Let’s just try some different data: Mean summer near-surface air temperature.

td2 = TOAD("test_data/global_mean_summer_tas.nc")
td2.data = td2.data.drop_vars(td2.shift_vars)

# Compute shifts
td2.compute_shifts(
    "tas",
    method=ASDETECT(),
)

INFO: New shifts variable tas_dts: min/mean/max=-0.875/0.014/0.979 using 139968 grid cells. Skipped 0.0% grid cells: 0 NaN, 0 constant.

# Compute clusters
td2.compute_clusters(
    "tas",
    method=HDBSCAN(min_cluster_size=20),
)

INFO: New cluster variable tas_dts_cluster: Identified 18 clusters in 13,291 pts; Left 2.5% as noise (329 pts).

# Plot clusters
td2.plot.cluster_map("tas", cluster_ids=range(10), map_style={"projection": "global"});

# Tip: map style can also be set as a dict, instead of importing the MapStyle class

../_images/612b153807cce8f3f62e6a155a1c5029690a1099e443676eaa23c5a770f5b072.png