toad.postprocessing.stats

Classes

Stats(toad, var)

Interface to access specialized statistics calculators for clusters: time, space, and general metrics.

class toad.postprocessing.stats.GeneralStats(toad, var)

Bases: object

General cluster statistics, such as cluster score.

Parameters:

var (str)

aggregate_cluster_scores(cluster_ids, score_method, aggregation='mean', weights=None, **kwargs)

Compute a score for multiple clusters and aggregate the results.

Parameters:
  • cluster_ids (list[int]) – List of cluster IDs

  • score_method (str) – Name of the scoring method (e.g., “score_nonlinearity”)

  • aggregation (str | Callable) – “mean”, “median”, “weighted”, or custom function

  • weights (ndarray | None) – Weights for each cluster (if aggregation=”weighted”)

  • **kwargs – Arguments passed to the scoring method

Returns:

Aggregated score across all clusters

Return type:

float

score_consistency(cluster_id)

Measures how internally consistent a cluster is by analyzing the similarity between its time series.

Uses hierarchical clustering to group similar time series and computes an inconsistency score. The final score is inverted so higher values indicate more consistency.

The method works by: 1. Computing pairwise R² correlations between all time series in the cluster 2. Converting correlations to distances (1 - R²) 3. Performing hierarchical clustering using Ward linkage 4. Calculating inconsistency coefficients at the highest level 5. Converting to a consistency score by taking the inverse

Parameters:

cluster_id (int) – ID of the cluster to evaluate.

Returns:

Consistency score between 0-1, where:

1.0: Perfect consistency (all time series are identical) ~0.5: Moderate consistency 0.0: No consistency (single point or completely inconsistent)

Return type:

float

References

Kobe De Maeyer Master Thesis (2025)

score_heaviside(cluster_id, return_score_fit=False, aggregation='mean', percentile=None, normalize=None)

Evaluates how closely the spatially aggregated cluster time series resembles a perfect Heaviside function.

A score of 1 indicates a perfect step function, while 0 indicates a linear trend.

Parameters:
  • cluster_id – ID of the cluster to score.

  • return_score_fit – If True, returns linear regression fit along with score.

  • aggregation (Literal['mean', 'sum', 'std', 'median', 'percentile', 'max', 'min'] | str) – How to aggregate spatial data. Options are: - “mean” - Average across space - “median” - Median across space - “sum” - Sum across space - “std” - Standard deviation across space - “percentile” - Percentile across space (requires percentile arg) - “max” - Maximum across space - “min” - Minimum across space

  • percentile – Percentile value between 0-1 when using percentile aggregation.

  • normalize (Literal['max', 'max_each'] | None | str) – How to normalize the data. Options are: - “max” - Normalize by maximum value - “max_each” - Normalize each trajectory by its own maximum value - None: Do not normalize

Returns:

Cluster score between 0-1 if return_score_fit is False. tuple: (score, linear_fit) if return_score_fit is True, where score is a float between 0-1 and linear_fit is the fitted values.

Return type:

float

References

Kobe De Maeyer Master Thesis (2025)

score_nonlinearity(cluster_id, aggregation='mean', percentile=None, normalise_against_unclustered=False)

Computes nonlinearity of a cluster’s aggregated time series using RMSE from a linear fit.

The score measures how much the time series deviates from a linear trend.

When normalise_against_unclustered=True:
  • Score > 1: Cluster is more nonlinear than typical unclustered behavior

  • Score ≈ 1: Cluster has similar nonlinearity to unclustered data

  • Score < 1: Cluster is more linear than unclustered data

When normalise_against_unclustered=False:
  • Returns raw RMSE (0 = perfectly linear, higher = more nonlinear)

  • Useful for comparing clusters to each other

Parameters:
  • cluster_id (int) – Cluster ID to evaluate.

  • aggregation (Literal['mean', 'sum', 'std', 'median', 'percentile']) – How to aggregate spatial data: - “mean”: Average across space - “median”: Median across space - “sum”: Sum across space - “std”: Standard deviation across space - “percentile”: Percentile across space (requires percentile arg)

  • percentile (float | None) – Percentile value between 0–1 (only used if aggregation=”percentile”)

  • normalize_against_unclustered – If True, normalize score by average RMSE of unclustered points. This helps identify clusters that stand out from background behavior.

  • normalise_against_unclustered (bool)

Returns:

Nonlinearity score. Higher means more nonlinear behavior.

Interpretation depends on normalize_against_unclustered parameter.

Return type:

float

References

Kobe De Maeyer Master Thesis (2025)

score_overview(exclude_noise=True, shift_threshold=0.0, **kwargs)

Compute all available scores for every cluster and return as a pandas DataFrame.

This function computes all scoring methods defined in score_dictionary for each cluster and returns the results in a structured DataFrame format, similar to the consensus summary. Includes cluster size, spatial means, shift time statistics, and an aggregate score.

Parameters:
  • exclude_noise (bool) – Whether to exclude noise points (cluster ID -1). Defaults to True.

  • shift_threshold (float) – Minimum shift threshold for computing transition times. Defaults to 0.0.

  • **kwargs – Additional keyword arguments passed to scoring methods. These will be applied to all scoring methods that accept them. Common parameters include: - aggregation: Aggregation method for methods that support it (default: “mean”) - percentile: Percentile value for percentile aggregation - normalize: Normalization method for score_heaviside - normalise_against_unclustered: Boolean for score_nonlinearity (default: False)

Returns:

DataFrame with one row per cluster containing:
  • cluster_id: Cluster identifier

  • All score columns from score_dictionary

  • size: Number of space-time grid cells in the cluster

  • mean_{spatial_dim0}: Average spatial coordinate for first dimension

  • mean_{spatial_dim1}: Average spatial coordinate for second dimension

  • mean_shift_time: Mean transition time for the cluster

  • std_shift_time: Standard deviation of transition times within the cluster

  • aggregate_score: Product of all score values

Return type:

pd.DataFrame

Example

>>> stats = td.stats(var="temperature")
>>> overview = stats.score_overview()
>>> print(overview)
score_spatial_autocorrelation(cluster_id)

Computes average pairwise similarity (R²) between all time series in a cluster.

This measures how spatially coherent the cluster behavior is.

The score is calculated by: 1. Getting all time series for cells in the cluster 2. Computing pairwise R² correlations between all time series 3. Taking the mean of the upper triangle of the correlation matrix

Parameters:

cluster_id (int) – ID of the cluster to evaluate.

Returns:

Similarity score between 0-1, where:

1.0: Perfect similarity (all time series identical) ~0.5: Moderate spatial coherence 0.0: No similarity (completely uncorrelated)

Return type:

float

References

Kobe De Maeyer Master Thesis (2025)

class toad.postprocessing.stats.SpaceStats(toad, var)

Bases: object

Class containing functions for calculating space-related statistics for clusters, such as mean, median, std, etc.

all_stats(cluster_id)

Return all cluster stats

Return type:

dict

central_point_for_labeling(cluster_id)

Calculates a central point within the cluster’s spatial footprint suitable for labeling.

This method uses the Euclidean Distance Transform to find the point within the cluster footprint that is furthest from any edge (the “pole of inaccessibility”). This ensures the point is robustly inside the cluster shape, even for complex geometries like rings or C-shapes.

Parameters:

cluster_id – The ID of the cluster to analyze.

Returns:

A tuple containing the (y, x) coordinates of the calculated central point. Returns (np.nan, np.nan) if the footprint is empty.

Return type:

tuple[float, float]

footprint_cumulative_area(cluster_id)

Returns the total number of spatial cells that were ever touched by the cluster.

Return type:

int

footprint_mean(cluster_id)

Returns the mean of the spatial coordinates of the cluster footprint.

footprint_median(cluster_id)

Returns the median of the spatial coordinates of the cluster footprint.

footprint_std(cluster_id)

Returns the standard deviation of the spatial coordinates of the cluster footprint.

mean(cluster_id)

Returns the mean of the spatial coordinates across space and time.

median(cluster_id)

Returns the median of the spatial coordinates across space and time.

std(cluster_id)

Returns the standard deviation of the spatial coordinates across space and time.

class toad.postprocessing.stats.Stats(toad, var)

Bases: object

Interface to access specialized statistics calculators for clusters: time, space, and general metrics.

property general

Access general statistics for clusters.

property space

Access space-related statistics for clusters.

property time

Access time-related statistics for clusters.

class toad.postprocessing.stats.TimeStats(toad, var)

Bases: object

Class containing functions for calculating time-related statistics for clusters, such as start time, peak time, etc.

all_stats(cluster_id)

Return all cluster stats

Return type:

dict

compute_transition_time(cluster_ids=None, shift_threshold=0.25)

Computes the transition time for each grid cell.

This method identifies the time point of maximum rate of change (peak shift) for each spatial location in the data. It uses the absolute value of shifts to detect both positive and negative transitions.

Parameters:
  • cluster_ids (int | list[int] | None) – Optional integer or list of integers specifying which cluster IDs to analyze. If None, analyzes all clusters. If specified, only analyzes grid cells belonging to the given cluster(s).

  • shift_threshold – Optional float specifying the minimum absolute shift value that should be considered a valid transition. Defaults to 0.5. Grid cells with maximum shift values below this threshold will be marked as having no transition (NaN).

Returns:

xarray DataArray containing the transition time for each grid cell. Grid cells with no detected transition will contain NaN values. The output has the same spatial dimensions as the input shifts data.

Return type:

DataArray

Note

The transition time is determined by finding the time index where the absolute value of the shifts reaches its maximum for each grid cell. This corresponds to the point of most rapid change in the underlying data.

For grid cells where the maximum absolute shift value is below shift_threshold, or where no clear transition is detected, NaN values will be returned.

duration(cluster_id)

Return duration of the cluster in time.

Parameters:

cluster_id – ID of the cluster to calculate duration for.

Returns:

Duration of the cluster. If the original dataset uses cftime format,

the duration is returned in seconds.

Return type:

float

duration_timesteps(cluster_id)

Return duration of the cluster in timesteps.

Return type:

int

end(cluster_id)

Return the end time of the cluster.

Return type:

float | datetime

end_timestep(cluster_id)

Return the end index of the cluster

Return type:

int

iqr(cluster_id, lower_quantile, upper_quantile)

Get start and end time of the specified interquantile range of the cluster temporal density.

Parameters:
  • cluster_id – ID of the cluster

  • lower_quantile (float) – Lower bound of the interquantile range (0-1)

  • upper_quantile (float) – Upper bound of the interquantile range (0-1)

Returns:

Start time and end time of the interquantile range in original time format

Return type:

tuple

iqr_50(cluster_id)

Get start and end time of the 50% interquantile range of the cluster temporal density

Return type:

tuple[float, float]

iqr_68(cluster_id)

Get start and end time of the 68% interquantile range of the cluster temporal density

Return type:

tuple[float, float]

iqr_90(cluster_id)

Get start and end time of the 90% interquantile range of the cluster temporal density

Return type:

tuple[float, float]

mean(cluster_id)

Return mean time value of the cluster.

Return type:

float | datetime

median(cluster_id)

Return median time of the cluster.

Return type:

float | datetime

membership_peak(cluster_id)

Return the time of the largest cluster temporal density.

If there’s a plateau at the maximum value, returns the center of the plateau.

Return type:

float | datetime

membership_peak_density(cluster_id)

Return the largest cluster temporal density

Return type:

float

start(cluster_id)

Return the start time of the cluster.

Return type:

float | datetime

start_timestep(cluster_id)

Return the start index of the cluster

Return type:

float

std(cluster_id)

Return standard deviation of the time of the cluster.

Return type:

float

steepest_gradient(cluster_id)

Return the time of the steepest gradient of the median cluster timeseries

Return type:

float | datetime

steepest_gradient_timestep(cluster_id)

Return the index of the steepest gradient of the mean cluster timeseries inside the cluster time bounds

Return type:

float

Modules

general

space

time