toad.compute_clusters

toad.compute_clusters(td, var, method, shift_threshold=0.5, shift_direction='both', shift_selection='local', scaler=None, time_weight=1, regridder=None, disable_regridder=False, output_label_suffix='', output_label=None, overwrite=False, sort_by_size=True, optimize=False, optimize_params={'min_cluster_size': (10, 25), 'time_weight': (0.5, 1.5)}, optimize_objective='combined_spatial_nonlinearity', optimize_n_trials=50, optimize_direction='maximize', optimize_log_level=30, optimize_progress_bar=True)

Apply clustering to a dataset’s temporal shifts using a sklearn-compatible clustering algorithm.

Parameters:
  • td (TOAD) – TOAD object containing the data to cluster

  • var (str) – Name of the base variable or shifts variable to compute clusters for. If multiple shifts variables exist for the base variable, a ValueError is thrown, in which case you should specify the shifts variable name.

  • method (ClusterMixin | type) – The clustering method to use. Choose methods from sklearn.cluster or create your own by inheriting from sklearn.base.ClusterMixin.

  • shift_threshold (float) – The minimum magnitude a shift must reach to be included in clustering. Raising this threshold filters out less significant shifts and helps focus clustering on the most meaningful events, while reducing it will include more subtle (and potentially noisier) shifts. Default is 0.5, which effectively excludes most noise when using ASDETECT.

  • shift_direction (Literal['both', 'positive', 'negative'] | str) – The direction of the shift. Options are “both”, “positive”, “negative”. Defaults to “both”.

  • shift_selection (Literal['local', 'global', 'all'] | str) – How shift values are selected for clustering. All options respect shift_threshold and shift_direction: - “local”: Finds peaks within individual shift episodes. Cluster only local maxima within each contiguous segment where abs(shift) > shift_threshold. - “global”: Finds the overall strongest shift per grid cell. Cluster only the single maximum shift value per grid cell where abs(shift) > shift_threshold. - “all”: Cluster all shift values that meet the threshold and direction criteria. Includes all data points above threshold, not just peaks. Defaults to “local”.

  • scaler (StandardScaler | MinMaxScaler | RobustScaler | MaxAbsScaler | None) – The scaling method to apply to the data before clustering. StandardScaler(), MinMaxScaler(), RobustScaler() and MaxAbsScaler() from sklearn.preprocessing are supported. Defaults to None. This option will be removed in the future. Set scaler=None to use recommended temporal scaling only.

  • time_weight (float) – Controls the relative influence of time in clustering. By default, time values are automatically scaled to match the standard deviation of the spatial coordinates. Increasing time_weight gives more emphasis to the temporal dimension, resulting in clusters that are tighter in time (shorter delays between abrupt events). Decreasing it emphasizes the spatial dimensions, allowing clusters to span a wider range of shift times. Defaults to 1.

  • regridder (BaseRegridder | None) – The regridding method to use from toad.clustering.regridding. Defaults to None. If None and coordinates are lat/lon, a HealPixRegridder will be created automatically.

  • disable_regridder (bool) – Whether to disable the regridder. Defaults to False.

  • output_label_suffix (str) – A suffix to add to the output label. Defaults to “”.

  • overwrite (bool) – If True, overwrite existing variable of same name. If False, same name is used with an added number. Defaults to False.

  • sort_by_size (bool) – Whether to reorder clusters by size. Defaults to True.

  • optimize (bool) – Whether to optimize the clustering parameters. Defaults to False.

  • optimize_params (dict) – Parameters for the optimization. Defaults to default_opt_params.

  • optimize_objective (Callable | Literal['median_heaviside', 'mean_heaviside', 'mean_consistency', 'mean_spatial_autocorrelation', 'mean_nonlinearity', 'combined_spatial_nonlinearity'] | str) – The objective function to optimize. Defaults to combined_spatial_nonlinearity. Can be one of: - callable: Custom objective function taking (td, output_label) as arguments - “median_heaviside”: Median heaviside score across clusters - “mean_heaviside”: Mean heaviside score across clusters - “mean_consistency”: Mean consistency score across clusters - “mean_spatial_autocorrelation”: Mean spatial autocorrelation score - “mean_nonlinearity”: Mean nonlinearity score across clusters

  • optimize_n_trials (int) – Number of trials to run for optimization. Defaults to 50.

  • optimize_direction (str) – The direction of the optimization. Defaults to “maximize”.

  • optimize_log_level (int) – The log level for the optimization. Defaults to optuna.logging.WARNING.

  • optimize_progress_bar (bool) – Whether to show the progress bar for the optimization. Defaults to True.

  • output_label (str | None)

Returns:

An xarray.Dataset containing the original data and the clustering results.

Return type:

Dataset

Notes

For global datasets, use toad.clustering.regridding.HealpyRegridder to ensure equal spacing between data points and prevent biased clustering at high latitudes.