toad.postprocessing.stats.general

Classes

GeneralStats(toad, var)

General cluster statistics, such as cluster score.

class toad.postprocessing.stats.general.GeneralStats(toad, var)

Bases: object

General cluster statistics, such as cluster score.

Parameters:

var (str)

aggregate_cluster_scores(cluster_ids, score_method, aggregation='mean', weights=None, **kwargs)

Compute a score for multiple clusters and aggregate the results.

Parameters:
  • cluster_ids (list[int]) – List of cluster IDs

  • score_method (str) – Name of the scoring method (e.g., “score_nonlinearity”)

  • aggregation (str | Callable) – “mean”, “median”, “weighted”, or custom function

  • weights (ndarray | None) – Weights for each cluster (if aggregation=”weighted”)

  • **kwargs – Arguments passed to the scoring method

Returns:

Aggregated score across all clusters

Return type:

float

score_consistency(cluster_id)

Measures how internally consistent a cluster is by analyzing the similarity between its time series.

Uses hierarchical clustering to group similar time series and computes an inconsistency score. The final score is inverted so higher values indicate more consistency.

The method works by: 1. Computing pairwise R² correlations between all time series in the cluster 2. Converting correlations to distances (1 - R²) 3. Performing hierarchical clustering using Ward linkage 4. Calculating inconsistency coefficients at the highest level 5. Converting to a consistency score by taking the inverse

Parameters:

cluster_id (int) – ID of the cluster to evaluate.

Returns:

Consistency score between 0-1, where:

1.0: Perfect consistency (all time series are identical) ~0.5: Moderate consistency 0.0: No consistency (single point or completely inconsistent)

Return type:

float

References

Kobe De Maeyer Master Thesis (2025)

score_heaviside(cluster_id: int, return_score_fit: Literal[False] = False, aggregation: Literal['mean', 'sum', 'std', 'median', 'percentile', 'max', 'min'] | str = 'mean', percentile: float | None = None, normalize: Literal['max', 'max_each'] | None | str = None) float
score_heaviside(cluster_id: int, return_score_fit: Literal[True], aggregation: Literal['mean', 'sum', 'std', 'median', 'percentile', 'max', 'min'] | str = 'mean', percentile: float | None = None, normalize: Literal['max', 'max_each'] | None | str = None) Tuple[float, ndarray]

Evaluates how closely the spatially aggregated cluster time series resembles a perfect Heaviside function.

A score of 1 indicates a perfect step function, while 0 indicates a linear trend.

Parameters:
  • cluster_id – ID of the cluster to score.

  • return_score_fit – If True, returns linear regression fit along with score.

  • aggregation – How to aggregate spatial data. Options are: - “mean” - Average across space - “median” - Median across space - “sum” - Sum across space - “std” - Standard deviation across space - “percentile” - Percentile across space (requires percentile arg) - “max” - Maximum across space - “min” - Minimum across space

  • percentile – Percentile value between 0-1 when using percentile aggregation.

  • normalize – How to normalize the data. Options are: - “max” - Normalize by maximum value - “max_each” - Normalize each trajectory by its own maximum value - None: Do not normalize

Returns:

Cluster score between 0-1 if return_score_fit is False. tuple: (score, linear_fit) if return_score_fit is True, where score is a float between 0-1 and linear_fit is the fitted values.

Return type:

float

References

Kobe De Maeyer Master Thesis (2025)

score_nonlinearity(cluster_id, aggregation='mean', percentile=None, normalise_against_unclustered=False)

Computes nonlinearity of a cluster’s aggregated time series using RMSE from a linear fit.

The score measures how much the time series deviates from a linear trend.

When normalise_against_unclustered=True:
  • Score > 1: Cluster is more nonlinear than typical unclustered behavior

  • Score ≈ 1: Cluster has similar nonlinearity to unclustered data

  • Score < 1: Cluster is more linear than unclustered data

When normalise_against_unclustered=False:
  • Returns raw RMSE (0 = perfectly linear, higher = more nonlinear)

  • Useful for comparing clusters to each other

Parameters:
  • cluster_id (int) – Cluster ID to evaluate.

  • aggregation (Literal['mean', 'sum', 'std', 'median', 'percentile']) – How to aggregate spatial data: - “mean”: Average across space - “median”: Median across space - “sum”: Sum across space - “std”: Standard deviation across space - “percentile”: Percentile across space (requires percentile arg)

  • percentile (float | None) – Percentile value between 0–1 (only used if aggregation=”percentile”)

  • normalize_against_unclustered – If True, normalize score by average RMSE of unclustered points. This helps identify clusters that stand out from background behavior.

  • normalise_against_unclustered (bool)

Returns:

Nonlinearity score. Higher means more nonlinear behavior.

Interpretation depends on normalize_against_unclustered parameter.

Return type:

float

References

Kobe De Maeyer Master Thesis (2025)

score_overview(exclude_noise=True, shift_threshold=0.0, **kwargs)

Compute all available scores for every cluster and return as a pandas DataFrame.

This function computes all scoring methods defined in score_dictionary for each cluster and returns the results in a structured DataFrame format, similar to the consensus summary. Includes cluster size, spatial means, shift time statistics, and an aggregate score.

Parameters:
  • exclude_noise (bool) – Whether to exclude noise points (cluster ID -1). Defaults to True.

  • shift_threshold (float) – Minimum shift threshold for computing transition times. Defaults to 0.0.

  • **kwargs – Additional keyword arguments passed to scoring methods. These will be applied to all scoring methods that accept them. Common parameters include: - aggregation: Aggregation method for methods that support it (default: “mean”) - percentile: Percentile value for percentile aggregation - normalize: Normalization method for score_heaviside - normalise_against_unclustered: Boolean for score_nonlinearity (default: False)

Returns:

DataFrame with one row per cluster containing:
  • cluster_id: Cluster identifier

  • All score columns from score_dictionary

  • size: Number of space-time grid cells in the cluster

  • mean_{spatial_dim0}: Average spatial coordinate for first dimension

  • mean_{spatial_dim1}: Average spatial coordinate for second dimension

  • mean_shift_time: Mean transition time for the cluster

  • std_shift_time: Standard deviation of transition times within the cluster

  • aggregate_score: Product of all score values

Return type:

pd.DataFrame

Example

>>> stats = td.stats(var="temperature")
>>> overview = stats.score_overview()
>>> print(overview)
score_spatial_autocorrelation(cluster_id)

Computes average pairwise similarity (R²) between all time series in a cluster.

This measures how spatially coherent the cluster behavior is.

The score is calculated by: 1. Getting all time series for cells in the cluster 2. Computing pairwise R² correlations between all time series 3. Taking the mean of the upper triangle of the correlation matrix

Parameters:

cluster_id (int) – ID of the cluster to evaluate.

Returns:

Similarity score between 0-1, where:

1.0: Perfect similarity (all time series identical) ~0.5: Moderate spatial coherence 0.0: No similarity (completely uncorrelated)

Return type:

float

References

Kobe De Maeyer Master Thesis (2025)