toad.postprocessing.stats¶

Classes

Stats(toad, var)

Interface to access specialized statistics calculators for clusters: time, space, and general metrics.

class toad.postprocessing.stats.GeneralStats(toad, var)¶

Bases: object

General cluster statistics, such as cluster score.

Parameters:: var (str)

aggregate_cluster_scores(cluster_ids, score_method, aggregation='mean', weights=None, **kwargs)¶

Compute a score for multiple clusters and aggregate the results.

Parameters:

cluster_ids (list[int]) – List of cluster IDs
score_method (str) – Name of the scoring method (e.g., “score_nonlinearity”)
aggregation (str | Callable) – “mean”, “median”, “weighted”, or custom function
weights (ndarray | None) – Weights for each cluster (if aggregation=”weighted”)
**kwargs – Arguments passed to the scoring method

Returns:

Aggregated score across all clusters

Return type:

float

score_consistency(cluster_id)¶

Measures how internally consistent a cluster is by analyzing the similarity between its time series.

Uses hierarchical clustering to group similar time series and computes an inconsistency score. The final score is inverted so higher values indicate more consistency.

The method works by: 1. Computing pairwise R² correlations between all time series in the cluster 2. Converting correlations to distances (1 - R²) 3. Performing hierarchical clustering using Ward linkage 4. Calculating inconsistency coefficients at the highest level 5. Converting to a consistency score by taking the inverse

Parameters:

cluster_id (int) – ID of the cluster to evaluate.

Returns:

Consistency score between 0-1, where:: 1.0: Perfect consistency (all time series are identical) ~0.5: Moderate consistency 0.0: No consistency (single point or completely inconsistent)

Return type:

float

References

Kobe De Maeyer Master Thesis (2025)

score_heaviside(cluster_id: int, return_score_fit: Literal[False] = False, aggregation: Literal['mean', 'sum', 'std', 'median', 'percentile', 'max', 'min'] | str = 'mean', percentile: float | None = None, normalize: Literal['max', 'max_each'] | None | str = None) → float¶

score_heaviside(cluster_id: int, return_score_fit: Literal[True], aggregation: Literal['mean', 'sum', 'std', 'median', 'percentile', 'max', 'min'] | str = 'mean', percentile: float | None = None, normalize: Literal['max', 'max_each'] | None | str = None) → Tuple[float, ndarray]

Evaluates how closely the spatially aggregated cluster time series resembles a perfect Heaviside function.

A score of 1 indicates a perfect step function, while 0 indicates a linear trend.

Parameters:

cluster_id – ID of the cluster to score.
return_score_fit – If True, returns linear regression fit along with score.
aggregation (Literal['mean', 'sum', 'std', 'median', 'percentile', 'max', 'min'] | str) – How to aggregate spatial data. Options are: - “mean” - Average across space - “median” - Median across space - “sum” - Sum across space - “std” - Standard deviation across space - “percentile” - Percentile across space (requires percentile arg) - “max” - Maximum across space - “min” - Minimum across space
percentile – Percentile value between 0-1 when using percentile aggregation.
normalize (Literal['max', 'max_each'] | None | str) – How to normalize the data. Options are: - “max” - Normalize by maximum value - “max_each” - Normalize each trajectory by its own maximum value - None: Do not normalize

Returns:

Cluster score between 0-1 if return_score_fit is False. tuple: (score, linear_fit) if return_score_fit is True, where score is a float between 0-1 and linear_fit is the fitted values.

Return type:

float

References

Kobe De Maeyer Master Thesis (2025)

score_nonlinearity(cluster_id, aggregation='mean', percentile=None, normalise_against_unclustered=False)¶

Computes nonlinearity of a cluster’s aggregated time series using RMSE from a linear fit.

The score measures how much the time series deviates from a linear trend.

When normalise_against_unclustered=True:

Score > 1: Cluster is more nonlinear than typical unclustered behavior
Score ≈ 1: Cluster has similar nonlinearity to unclustered data
Score < 1: Cluster is more linear than unclustered data

When normalise_against_unclustered=False:

Returns raw RMSE (0 = perfectly linear, higher = more nonlinear)
Useful for comparing clusters to each other

Parameters:

cluster_id (int) – Cluster ID to evaluate.
aggregation (Literal['mean', 'sum', 'std', 'median', 'percentile']) – How to aggregate spatial data: - “mean”: Average across space - “median”: Median across space - “sum”: Sum across space - “std”: Standard deviation across space - “percentile”: Percentile across space (requires percentile arg)
percentile (float | None) – Percentile value between 0–1 (only used if aggregation=”percentile”)
normalise_against_unclustered (bool) – If True, normalise score by average RMSE of unclustered points. This helps identify clusters that stand out from background behavior.

Returns:

Nonlinearity score. Higher means more nonlinear behavior.: Interpretation depends on normalise_against_unclustered parameter.

Return type:

float

References

Kobe De Maeyer Master Thesis (2025)

score_overview(exclude_noise=True, shift_threshold=0.0, **kwargs)¶

Compute all available scores for every cluster and return as a pandas DataFrame.

This function computes all scoring methods defined in score_dictionary for each cluster and returns the results in a structured DataFrame format, similar to the consensus summary. Includes cluster size, spatial means, shift time statistics, and an aggregate score.

Parameters:

exclude_noise (bool) – Whether to exclude noise points (cluster ID -1). Defaults to True.
shift_threshold (float) – Minimum shift threshold for computing transition times. Defaults to 0.0.
**kwargs – Additional keyword arguments passed to scoring methods. These will be applied to all scoring methods that accept them. Common parameters include: - aggregation: Aggregation method for methods that support it (default: “mean”) - percentile: Percentile value for percentile aggregation - normalize: Normalization method for score_heaviside - normalise_against_unclustered: Boolean for score_nonlinearity (default: False)

Returns:

DataFrame with one row per cluster containing:

cluster_id: Cluster identifier
All score columns from score_dictionary
size: Number of space-time grid cells in the cluster
mean_{spatial_dim0}: Average spatial coordinate for first dimension
mean_{spatial_dim1}: Average spatial coordinate for second dimension
mean_shift_time: Mean transition time for the cluster
std_shift_time: Standard deviation of transition times within the cluster
aggregate_score: Product of all score values

Return type:

pd.DataFrame

Example

>>> stats = td.stats(var="temperature")
>>> overview = stats.score_overview()
>>> print(overview)

score_spatial_autocorrelation(cluster_id)¶

Computes average pairwise similarity (R²) between all time series in a cluster.

This measures how spatially coherent the cluster behavior is.

The score is calculated by: 1. Getting all time series for cells in the cluster 2. Computing pairwise R² correlations between all time series 3. Taking the mean of the upper triangle of the correlation matrix

Parameters:

cluster_id (int) – ID of the cluster to evaluate.

Returns:

Similarity score between 0-1, where:: 1.0: Perfect similarity (all time series identical) ~0.5: Moderate spatial coherence 0.0: No similarity (completely uncorrelated)

Return type:

float

References

Kobe De Maeyer Master Thesis (2025)

class toad.postprocessing.stats.SpaceStats(toad, var)¶

Bases: object

Class containing functions for calculating space-related statistics for clusters, such as mean, median, std, etc.

all_stats(cluster_id)¶

Return all cluster stats

Return type:: dict

central_point_for_labeling(cluster_id)¶

Calculates a central point within the cluster’s spatial footprint suitable for labeling.

This method uses the Euclidean Distance Transform to find the point within the cluster footprint that is furthest from any edge (the “pole of inaccessibility”). This ensures the point is robustly inside the cluster shape, even for complex geometries like rings or C-shapes.

Parameters:: cluster_id – The ID of the cluster to analyze.
Returns:: A tuple containing the (y, x) coordinates of the calculated central point. Returns (np.nan, np.nan) if the footprint is empty.
Return type:: tuple[float, float]

footprint_cumulative_area(cluster_id)¶

Returns the total number of spatial cells that were ever touched by the cluster.

Return type:: int

footprint_mean(cluster_id)¶: Returns the mean of the spatial coordinates of the cluster footprint.

footprint_median(cluster_id)¶: Returns the median of the spatial coordinates of the cluster footprint.

footprint_std(cluster_id)¶: Returns the standard deviation of the spatial coordinates of the cluster footprint.

mean(cluster_id)¶: Returns the mean of the spatial coordinates across space and time.

median(cluster_id)¶: Returns the median of the spatial coordinates across space and time.

std(cluster_id)¶: Returns the standard deviation of the spatial coordinates across space and time.

class toad.postprocessing.stats.Stats(toad, var)¶

Bases: object

Interface to access specialized statistics calculators for clusters: time, space, and general metrics.

Used when calling td.stats(var) explicitly; _StatsAccessor in core.py delegates here for td.stats.time etc.

property general¶: Access general statistics for clusters.

property space¶: Access space-related statistics for clusters.

property time¶: Access time-related statistics for clusters.

class toad.postprocessing.stats.TimeStats(toad, var)¶

Bases: object

Class containing functions for calculating time-related statistics for clusters, such as start time, peak time, etc.

all_stats(cluster_id)¶

Return all cluster stats

Return type:: dict

compute_transition_time(cluster_ids=None, shift_threshold=0.5)¶

Computes the transition time for each grid cell.

This method identifies the time point of maximum rate of change (peak shift) for each spatial location in the data. It uses the absolute value of shifts to detect both positive and negative transitions.

Parameters:

cluster_ids (int | list[int] | range | None) – Optional integer or list of integers specifying which cluster IDs to analyze. If None, analyzes all clusters. If specified, only analyzes grid cells belonging to the given cluster(s).
shift_threshold (float) – Optional float specifying the minimum absolute shift value that should be considered a valid transition. Defaults to 0.5. Grid cells with maximum shift values below this threshold will be marked as having no transition (NaN).

Returns:

xarray DataArray containing the transition time for each grid cell. Grid cells with no detected transition will contain NaN values. The output has the same spatial dimensions as the input shifts data.

Return type:

DataArray

Note

The transition time is determined by finding the time index where the absolute value of the shifts reaches its maximum for each grid cell. This corresponds to the point of most rapid change in the underlying data.

For grid cells where the maximum absolute shift value is below shift_threshold, or where no clear transition is detected, NaN values will be returned.

duration(cluster_id)¶

Return duration of the cluster in time.

Parameters:

cluster_id – ID of the cluster to calculate duration for.

Returns:

Duration of the cluster. If the original dataset uses cftime format,: the duration is returned in seconds.

Return type:

float

duration_timesteps(cluster_id)¶

Return duration of the cluster in timesteps.

Return type:: int

end(cluster_id)¶

Return the end time of the cluster.

Return type:: float | datetime | datetime64

end_timestep(cluster_id)¶

Return the end index of the cluster

Return type:: int

iqr(cluster_id, lower_quantile, upper_quantile)¶

Get start and end time of the specified interquantile range of the cluster temporal density.

Parameters:

cluster_id – ID of the cluster
lower_quantile (float) – Lower bound of the interquantile range (0-1)
upper_quantile (float) – Upper bound of the interquantile range (0-1)

Returns:

Start time and end time of the interquantile range in original time format

Return type:

tuple

iqr_50(cluster_id)¶

Get start and end time of the 50% interquantile range of the cluster temporal density

Return type:: tuple[float | datetime | datetime64, float | datetime | datetime64]

iqr_68(cluster_id)¶

Get start and end time of the 68% interquantile range of the cluster temporal density

Return type:: tuple[float | datetime | datetime64, float | datetime | datetime64]

iqr_90(cluster_id)¶

Get start and end time of the 90% interquantile range of the cluster temporal density

Return type:: tuple[float | datetime | datetime64, float | datetime | datetime64]

mean(cluster_id)¶

Return mean time value of the cluster.

Return type:: float | datetime | datetime64

mean_shift_magnitude(cluster_id)¶

Alias for value_change(aggregation=”mean”).

Return type:: float

median(cluster_id)¶

Return median time of the cluster.

Return type:: float | datetime | datetime64

membership_peak(cluster_id)¶

Return the time of the largest cluster temporal density.

If there’s a plateau at the maximum value, returns the center of the plateau.

Return type:: float | datetime | datetime64

membership_peak_density(cluster_id)¶

Return the largest cluster temporal density

Return type:: float

start(cluster_id)¶

Return the start time of the cluster.

Return type:: float | datetime | datetime64

start_timestep(cluster_id)¶

Return the start index of the cluster

Return type:: float

std(cluster_id)¶

Return standard deviation of the time of the cluster.

Return type:: float

steepest_gradient(cluster_id)¶

Return the time of the steepest gradient (largest rate of change, up or down) of the median cluster timeseries.

Return type:: float | datetime | datetime64

steepest_gradient_timestep(cluster_id)¶

Return the index of the steepest gradient (largest rate of change, up or down) of the median cluster timeseries inside the cluster time bounds.

Return type:: float

value_at_end(cluster_id, aggregation='median')¶

Return aggregated cluster value at the end timestep.

Parameters:: aggregation (str)
Return type:: float

value_at_iqr_90_end(cluster_id, aggregation='median')¶

Return aggregated cluster value at the upper iqr_90 bound.

Parameters:: aggregation (str)
Return type:: float

value_at_iqr_90_start(cluster_id, aggregation='median')¶

Return aggregated cluster value at the lower iqr_90 bound.

Parameters:: aggregation (str)
Return type:: float

value_at_start(cluster_id, aggregation='median')¶

Return aggregated cluster value at the start timestep.

Parameters:: aggregation (str)
Return type:: float

value_change(cluster_id, aggregation='median')¶

Return signed aggregated value change across full span (end - start).

Parameters:: aggregation (str)
Return type:: float

value_change_iqr_90(cluster_id, aggregation='median')¶

Return signed aggregated value change across iqr_90 bounds (upper - lower).

Parameters:: aggregation (str)
Return type:: float

Modules

`general`
`space`
`time`

toad.postprocessing.aggregation

toad.postprocessing.stats.general