toad.postprocessing.stats¶
Classes
|
Interface to access specialized statistics calculators for clusters: time, space, and general metrics. |
- class toad.postprocessing.stats.GeneralStats(toad, var)¶
Bases:
objectGeneral cluster statistics, such as cluster score.
- Parameters:
var (str)
- aggregate_cluster_scores(cluster_ids, score_method, aggregation='mean', weights=None, **kwargs)¶
Compute a score for multiple clusters and aggregate the results.
- Parameters:
cluster_ids (list[int]) – List of cluster IDs
score_method (str) – Name of the scoring method (e.g., “score_nonlinearity”)
aggregation (str | Callable) – “mean”, “median”, “weighted”, or custom function
weights (ndarray | None) – Weights for each cluster (if aggregation=”weighted”)
**kwargs – Arguments passed to the scoring method
- Returns:
Aggregated score across all clusters
- Return type:
float
- score_consistency(cluster_id)¶
Measures how internally consistent a cluster is by analyzing the similarity between its time series.
Uses hierarchical clustering to group similar time series and computes an inconsistency score. The final score is inverted so higher values indicate more consistency.
The method works by: 1. Computing pairwise R² correlations between all time series in the cluster 2. Converting correlations to distances (1 - R²) 3. Performing hierarchical clustering using Ward linkage 4. Calculating inconsistency coefficients at the highest level 5. Converting to a consistency score by taking the inverse
- Parameters:
cluster_id (int) – ID of the cluster to evaluate.
- Returns:
- Consistency score between 0-1, where:
1.0: Perfect consistency (all time series are identical) ~0.5: Moderate consistency 0.0: No consistency (single point or completely inconsistent)
- Return type:
float
References
Kobe De Maeyer Master Thesis (2025)
- score_heaviside(cluster_id, return_score_fit=False, aggregation='mean', percentile=None, normalize=None)¶
Evaluates how closely the spatially aggregated cluster time series resembles a perfect Heaviside function.
A score of 1 indicates a perfect step function, while 0 indicates a linear trend.
- Parameters:
cluster_id – ID of the cluster to score.
return_score_fit – If True, returns linear regression fit along with score.
aggregation (Literal['mean', 'sum', 'std', 'median', 'percentile', 'max', 'min'] | str) – How to aggregate spatial data. Options are: - “mean” - Average across space - “median” - Median across space - “sum” - Sum across space - “std” - Standard deviation across space - “percentile” - Percentile across space (requires percentile arg) - “max” - Maximum across space - “min” - Minimum across space
percentile – Percentile value between 0-1 when using percentile aggregation.
normalize (Literal['max', 'max_each'] | None | str) – How to normalize the data. Options are: - “max” - Normalize by maximum value - “max_each” - Normalize each trajectory by its own maximum value - None: Do not normalize
- Returns:
Cluster score between 0-1 if return_score_fit is False. tuple: (score, linear_fit) if return_score_fit is True, where score is a float between 0-1 and linear_fit is the fitted values.
- Return type:
float
References
Kobe De Maeyer Master Thesis (2025)
- score_nonlinearity(cluster_id, aggregation='mean', percentile=None, normalise_against_unclustered=False)¶
Computes nonlinearity of a cluster’s aggregated time series using RMSE from a linear fit.
The score measures how much the time series deviates from a linear trend.
- When normalise_against_unclustered=True:
Score > 1: Cluster is more nonlinear than typical unclustered behavior
Score ≈ 1: Cluster has similar nonlinearity to unclustered data
Score < 1: Cluster is more linear than unclustered data
- When normalise_against_unclustered=False:
Returns raw RMSE (0 = perfectly linear, higher = more nonlinear)
Useful for comparing clusters to each other
- Parameters:
cluster_id (int) – Cluster ID to evaluate.
aggregation (Literal['mean', 'sum', 'std', 'median', 'percentile']) – How to aggregate spatial data: - “mean”: Average across space - “median”: Median across space - “sum”: Sum across space - “std”: Standard deviation across space - “percentile”: Percentile across space (requires percentile arg)
percentile (float | None) – Percentile value between 0–1 (only used if aggregation=”percentile”)
normalize_against_unclustered – If True, normalize score by average RMSE of unclustered points. This helps identify clusters that stand out from background behavior.
normalise_against_unclustered (bool)
- Returns:
- Nonlinearity score. Higher means more nonlinear behavior.
Interpretation depends on normalize_against_unclustered parameter.
- Return type:
float
References
Kobe De Maeyer Master Thesis (2025)
- score_overview(exclude_noise=True, shift_threshold=0.0, **kwargs)¶
Compute all available scores for every cluster and return as a pandas DataFrame.
This function computes all scoring methods defined in score_dictionary for each cluster and returns the results in a structured DataFrame format, similar to the consensus summary. Includes cluster size, spatial means, shift time statistics, and an aggregate score.
- Parameters:
exclude_noise (bool) – Whether to exclude noise points (cluster ID -1). Defaults to True.
shift_threshold (float) – Minimum shift threshold for computing transition times. Defaults to 0.0.
**kwargs – Additional keyword arguments passed to scoring methods. These will be applied to all scoring methods that accept them. Common parameters include: - aggregation: Aggregation method for methods that support it (default: “mean”) - percentile: Percentile value for percentile aggregation - normalize: Normalization method for score_heaviside - normalise_against_unclustered: Boolean for score_nonlinearity (default: False)
- Returns:
- DataFrame with one row per cluster containing:
cluster_id: Cluster identifier
All score columns from score_dictionary
size: Number of space-time grid cells in the cluster
mean_{spatial_dim0}: Average spatial coordinate for first dimension
mean_{spatial_dim1}: Average spatial coordinate for second dimension
mean_shift_time: Mean transition time for the cluster
std_shift_time: Standard deviation of transition times within the cluster
aggregate_score: Product of all score values
- Return type:
pd.DataFrame
Example
>>> stats = td.stats(var="temperature") >>> overview = stats.score_overview() >>> print(overview)
- score_spatial_autocorrelation(cluster_id)¶
Computes average pairwise similarity (R²) between all time series in a cluster.
This measures how spatially coherent the cluster behavior is.
The score is calculated by: 1. Getting all time series for cells in the cluster 2. Computing pairwise R² correlations between all time series 3. Taking the mean of the upper triangle of the correlation matrix
- Parameters:
cluster_id (int) – ID of the cluster to evaluate.
- Returns:
- Similarity score between 0-1, where:
1.0: Perfect similarity (all time series identical) ~0.5: Moderate spatial coherence 0.0: No similarity (completely uncorrelated)
- Return type:
float
References
Kobe De Maeyer Master Thesis (2025)
- class toad.postprocessing.stats.SpaceStats(toad, var)¶
Bases:
objectClass containing functions for calculating space-related statistics for clusters, such as mean, median, std, etc.
- all_stats(cluster_id)¶
Return all cluster stats
- Return type:
dict
- central_point_for_labeling(cluster_id)¶
Calculates a central point within the cluster’s spatial footprint suitable for labeling.
This method uses the Euclidean Distance Transform to find the point within the cluster footprint that is furthest from any edge (the “pole of inaccessibility”). This ensures the point is robustly inside the cluster shape, even for complex geometries like rings or C-shapes.
- Parameters:
cluster_id – The ID of the cluster to analyze.
- Returns:
A tuple containing the (y, x) coordinates of the calculated central point. Returns (np.nan, np.nan) if the footprint is empty.
- Return type:
tuple[float, float]
- footprint_cumulative_area(cluster_id)¶
Returns the total number of spatial cells that were ever touched by the cluster.
- Return type:
int
- footprint_mean(cluster_id)¶
Returns the mean of the spatial coordinates of the cluster footprint.
- footprint_median(cluster_id)¶
Returns the median of the spatial coordinates of the cluster footprint.
- footprint_std(cluster_id)¶
Returns the standard deviation of the spatial coordinates of the cluster footprint.
- mean(cluster_id)¶
Returns the mean of the spatial coordinates across space and time.
- median(cluster_id)¶
Returns the median of the spatial coordinates across space and time.
- std(cluster_id)¶
Returns the standard deviation of the spatial coordinates across space and time.
- class toad.postprocessing.stats.Stats(toad, var)¶
Bases:
objectInterface to access specialized statistics calculators for clusters: time, space, and general metrics.
- property general¶
Access general statistics for clusters.
- property space¶
Access space-related statistics for clusters.
- property time¶
Access time-related statistics for clusters.
- class toad.postprocessing.stats.TimeStats(toad, var)¶
Bases:
objectClass containing functions for calculating time-related statistics for clusters, such as start time, peak time, etc.
- all_stats(cluster_id)¶
Return all cluster stats
- Return type:
dict
- compute_transition_time(cluster_ids=None, shift_threshold=0.25)¶
Computes the transition time for each grid cell.
This method identifies the time point of maximum rate of change (peak shift) for each spatial location in the data. It uses the absolute value of shifts to detect both positive and negative transitions.
- Parameters:
cluster_ids (int | list[int] | None) – Optional integer or list of integers specifying which cluster IDs to analyze. If None, analyzes all clusters. If specified, only analyzes grid cells belonging to the given cluster(s).
shift_threshold – Optional float specifying the minimum absolute shift value that should be considered a valid transition. Defaults to 0.5. Grid cells with maximum shift values below this threshold will be marked as having no transition (NaN).
- Returns:
xarray DataArray containing the transition time for each grid cell. Grid cells with no detected transition will contain NaN values. The output has the same spatial dimensions as the input shifts data.
- Return type:
DataArray
Note
The transition time is determined by finding the time index where the absolute value of the shifts reaches its maximum for each grid cell. This corresponds to the point of most rapid change in the underlying data.
For grid cells where the maximum absolute shift value is below shift_threshold, or where no clear transition is detected, NaN values will be returned.
- duration(cluster_id)¶
Return duration of the cluster in time.
- Parameters:
cluster_id – ID of the cluster to calculate duration for.
- Returns:
- Duration of the cluster. If the original dataset uses cftime format,
the duration is returned in seconds.
- Return type:
float
- duration_timesteps(cluster_id)¶
Return duration of the cluster in timesteps.
- Return type:
int
- end(cluster_id)¶
Return the end time of the cluster.
- Return type:
float | datetime
- end_timestep(cluster_id)¶
Return the end index of the cluster
- Return type:
int
- iqr(cluster_id, lower_quantile, upper_quantile)¶
Get start and end time of the specified interquantile range of the cluster temporal density.
- Parameters:
cluster_id – ID of the cluster
lower_quantile (float) – Lower bound of the interquantile range (0-1)
upper_quantile (float) – Upper bound of the interquantile range (0-1)
- Returns:
Start time and end time of the interquantile range in original time format
- Return type:
tuple
- iqr_50(cluster_id)¶
Get start and end time of the 50% interquantile range of the cluster temporal density
- Return type:
tuple[float, float]
- iqr_68(cluster_id)¶
Get start and end time of the 68% interquantile range of the cluster temporal density
- Return type:
tuple[float, float]
- iqr_90(cluster_id)¶
Get start and end time of the 90% interquantile range of the cluster temporal density
- Return type:
tuple[float, float]
- mean(cluster_id)¶
Return mean time value of the cluster.
- Return type:
float | datetime
- median(cluster_id)¶
Return median time of the cluster.
- Return type:
float | datetime
- membership_peak(cluster_id)¶
Return the time of the largest cluster temporal density.
If there’s a plateau at the maximum value, returns the center of the plateau.
- Return type:
float | datetime
- membership_peak_density(cluster_id)¶
Return the largest cluster temporal density
- Return type:
float
- start(cluster_id)¶
Return the start time of the cluster.
- Return type:
float | datetime
- start_timestep(cluster_id)¶
Return the start index of the cluster
- Return type:
float
- std(cluster_id)¶
Return standard deviation of the time of the cluster.
- Return type:
float
- steepest_gradient(cluster_id)¶
Return the time of the steepest gradient of the median cluster timeseries
- Return type:
float | datetime
- steepest_gradient_timestep(cluster_id)¶
Return the index of the steepest gradient of the mean cluster timeseries inside the cluster time bounds
- Return type:
float
Modules