toad.postprocessing.stats.general¶
Classes
|
General cluster statistics, such as cluster score. |
- class toad.postprocessing.stats.general.GeneralStats(toad, var)¶
Bases:
objectGeneral cluster statistics, such as cluster score.
- Parameters:
var (str)
- aggregate_cluster_scores(cluster_ids, score_method, aggregation='mean', weights=None, **kwargs)¶
Compute a score for multiple clusters and aggregate the results.
- Parameters:
cluster_ids (list[int]) – List of cluster IDs
score_method (str) – Name of the scoring method (e.g., “score_nonlinearity”)
aggregation (str | Callable) – “mean”, “median”, “weighted”, or custom function
weights (ndarray | None) – Weights for each cluster (if aggregation=”weighted”)
**kwargs – Arguments passed to the scoring method
- Returns:
Aggregated score across all clusters
- Return type:
float
- score_consistency(cluster_id)¶
Measures how internally consistent a cluster is by analyzing the similarity between its time series.
Uses hierarchical clustering to group similar time series and computes an inconsistency score. The final score is inverted so higher values indicate more consistency.
The method works by: 1. Computing pairwise R² correlations between all time series in the cluster 2. Converting correlations to distances (1 - R²) 3. Performing hierarchical clustering using Ward linkage 4. Calculating inconsistency coefficients at the highest level 5. Converting to a consistency score by taking the inverse
- Parameters:
cluster_id (int) – ID of the cluster to evaluate.
- Returns:
- Consistency score between 0-1, where:
1.0: Perfect consistency (all time series are identical) ~0.5: Moderate consistency 0.0: No consistency (single point or completely inconsistent)
- Return type:
float
References
Kobe De Maeyer Master Thesis (2025)
- score_heaviside(cluster_id: int, return_score_fit: Literal[False] = False, aggregation: Literal['mean', 'sum', 'std', 'median', 'percentile', 'max', 'min'] | str = 'mean', percentile: float | None = None, normalize: Literal['max', 'max_each'] | None | str = None) float¶
- score_heaviside(cluster_id: int, return_score_fit: Literal[True], aggregation: Literal['mean', 'sum', 'std', 'median', 'percentile', 'max', 'min'] | str = 'mean', percentile: float | None = None, normalize: Literal['max', 'max_each'] | None | str = None) Tuple[float, ndarray]
Evaluates how closely the spatially aggregated cluster time series resembles a perfect Heaviside function.
A score of 1 indicates a perfect step function, while 0 indicates a linear trend.
- Parameters:
cluster_id – ID of the cluster to score.
return_score_fit – If True, returns linear regression fit along with score.
aggregation – How to aggregate spatial data. Options are: - “mean” - Average across space - “median” - Median across space - “sum” - Sum across space - “std” - Standard deviation across space - “percentile” - Percentile across space (requires percentile arg) - “max” - Maximum across space - “min” - Minimum across space
percentile – Percentile value between 0-1 when using percentile aggregation.
normalize – How to normalize the data. Options are: - “max” - Normalize by maximum value - “max_each” - Normalize each trajectory by its own maximum value - None: Do not normalize
- Returns:
Cluster score between 0-1 if return_score_fit is False. tuple: (score, linear_fit) if return_score_fit is True, where score is a float between 0-1 and linear_fit is the fitted values.
- Return type:
float
References
Kobe De Maeyer Master Thesis (2025)
- score_nonlinearity(cluster_id, aggregation='mean', percentile=None, normalise_against_unclustered=False)¶
Computes nonlinearity of a cluster’s aggregated time series using RMSE from a linear fit.
The score measures how much the time series deviates from a linear trend.
- When normalise_against_unclustered=True:
Score > 1: Cluster is more nonlinear than typical unclustered behavior
Score ≈ 1: Cluster has similar nonlinearity to unclustered data
Score < 1: Cluster is more linear than unclustered data
- When normalise_against_unclustered=False:
Returns raw RMSE (0 = perfectly linear, higher = more nonlinear)
Useful for comparing clusters to each other
- Parameters:
cluster_id (int) – Cluster ID to evaluate.
aggregation (Literal['mean', 'sum', 'std', 'median', 'percentile']) – How to aggregate spatial data: - “mean”: Average across space - “median”: Median across space - “sum”: Sum across space - “std”: Standard deviation across space - “percentile”: Percentile across space (requires percentile arg)
percentile (float | None) – Percentile value between 0–1 (only used if aggregation=”percentile”)
normalize_against_unclustered – If True, normalize score by average RMSE of unclustered points. This helps identify clusters that stand out from background behavior.
normalise_against_unclustered (bool)
- Returns:
- Nonlinearity score. Higher means more nonlinear behavior.
Interpretation depends on normalize_against_unclustered parameter.
- Return type:
float
References
Kobe De Maeyer Master Thesis (2025)
- score_overview(exclude_noise=True, shift_threshold=0.0, **kwargs)¶
Compute all available scores for every cluster and return as a pandas DataFrame.
This function computes all scoring methods defined in score_dictionary for each cluster and returns the results in a structured DataFrame format, similar to the consensus summary. Includes cluster size, spatial means, shift time statistics, and an aggregate score.
- Parameters:
exclude_noise (bool) – Whether to exclude noise points (cluster ID -1). Defaults to True.
shift_threshold (float) – Minimum shift threshold for computing transition times. Defaults to 0.0.
**kwargs – Additional keyword arguments passed to scoring methods. These will be applied to all scoring methods that accept them. Common parameters include: - aggregation: Aggregation method for methods that support it (default: “mean”) - percentile: Percentile value for percentile aggregation - normalize: Normalization method for score_heaviside - normalise_against_unclustered: Boolean for score_nonlinearity (default: False)
- Returns:
- DataFrame with one row per cluster containing:
cluster_id: Cluster identifier
All score columns from score_dictionary
size: Number of space-time grid cells in the cluster
mean_{spatial_dim0}: Average spatial coordinate for first dimension
mean_{spatial_dim1}: Average spatial coordinate for second dimension
mean_shift_time: Mean transition time for the cluster
std_shift_time: Standard deviation of transition times within the cluster
aggregate_score: Product of all score values
- Return type:
pd.DataFrame
Example
>>> stats = td.stats(var="temperature") >>> overview = stats.score_overview() >>> print(overview)
- score_spatial_autocorrelation(cluster_id)¶
Computes average pairwise similarity (R²) between all time series in a cluster.
This measures how spatially coherent the cluster behavior is.
The score is calculated by: 1. Getting all time series for cells in the cluster 2. Computing pairwise R² correlations between all time series 3. Taking the mean of the upper triangle of the correlation matrix
- Parameters:
cluster_id (int) – ID of the cluster to evaluate.
- Returns:
- Similarity score between 0-1, where:
1.0: Perfect similarity (all time series identical) ~0.5: Moderate spatial coherence 0.0: No similarity (completely uncorrelated)
- Return type:
float
References
Kobe De Maeyer Master Thesis (2025)