toad.postprocessing¶

class toad.postprocessing.Aggregation(toad)¶

Bases: object

Aggregation methods for TOAD objects.

cluster_consensus(cluster_vars=None, min_consensus=0.75, top_n_clusters=None, neighbor_connectivity=8, regridder=None, k_neighbors=8, show_progress=True)¶

Build a spatial consensus clustering from multiple clustering results.

Implements a consensus aggregation method closely related to evidence accumulation clustering (EAC) from [Fred+Jain2005], but reformulated for spatial grid data. Instead of dense all-pairs co-association, we accumulate “votes” only between spatially neighboring cells, yielding a scalable sparse adjacency graph from which consensus regions are formed.

The method produces robust, spatially coherent regions that persist across clustering choices/variables by combining clusterings through a graph-based consensus approach.

Parameters:

cluster_vars (List[str] | None) – List of clustering variable names to include in the consensus. If None, uses all cluster variables in self.td.cluster_vars.
min_consensus (float) – Minimum fraction (in [0,1]) of clusterings that must support an edge (pixel adjacency) for it to be included in the consensus graph. Higher values = stricter consensus. Default: 0.5.
top_n_clusters (int | None) – If set, only top N largest clusters (per clustering) are used when voting for edges. If None, all clusters are included. Default: None.
neighbor_connectivity (int) – Neighborhood connectivity for spatial adjacency when lat/lon coordinates are not available. Either 4 (Von Neumann, horizontal/vertical only) or 8 (Moore, including diagonals). Default: 8. This parameter controls index-based grid adjacency (not K-nearest neighbors) and is only used for grids without geographic coordinates; for lat/lon grids, see k_neighbors.
regridder (HealPixRegridder | None) – Optional custom regridder. If None and data has regular lat/lon dimensions, HealPixRegridder will be used automatically. Default: None. Note: Currently only HealPixRegridder is supported for consensus clustering. Other regridders will raise a ValueError.
k_neighbors (int) – Number of nearest neighbors to consider for lat/lon grids using K-nearest neighbors on the sphere. Only applies when lat/lon coordinates are available. Higher values provide more connectivity but may be less spatially selective. Default: 8. For very high-resolution grids, consider increasing to 12-16; for coarse grids, 4-6 may suffice.
show_progress (bool) – Whether to show the progress bar. Default: True.

Returns:

A tuple containing:

Dataset with two variables:

clusters (int32, shape (y, x)): Consensus cluster/component labels. Values >= 0 indicate cluster membership; -1 indicates noise/unassigned.
consistency (float32, shape (y, x)): Local mean of co-association edge weights around each pixel, reflecting neighborhood agreement across input cluster maps.

DataFrame with one row per consensus cluster, containing:

cluster_id (int32): Cluster identifier.
mean_consistency (float32): Mean consistency score for the cluster.
size (int32): Number of spatial grid cells in the cluster.
mean_{space_dim0} (float32): Average spatial coordinate for first dimension.
mean_{space_dim1} (float32): Average spatial coordinate for second dimension.
mean_mean_shift_time (float32): Central estimate of transition time, averaged over space and clusterings.
std_mean_shift_time (float32): Variation in average shift time across clusterings.
mean_std_shift_time (float32): Average spatial spread of shift timing.
std_std_shift_time (float32): Variation in spatial coherence across clusterings.

Return type:

Tuple[xr.Dataset, pd.DataFrame]

Notes

The algorithm proceeds as follows:

Collapse time in each clustering map: mark a pixel as “clustered” if it is ever assigned to a cluster at any time.
For each clustering, obtain the spatial footprint of each cluster. Optionally, restrict to the top N clusters.
For each cluster, increment votes for each pair of adjacent (connected) pixels within that cluster.
Accumulate edge votes across all clusterings, then normalize by the number of clustering maps.
Retain only those edges (pixel adjacencies) present in at least min_consensus fraction of clusterings.
Construct an undirected sparse graph with surviving edges; run connected components labeling.
Relabel clusters in order of descending size for interpretability; assign -1 to isolated (noise) pixels.
Compute, for each pixel, the mean strength (consistency) of its incident consensus edges.

Additional implementation details:

Adjacency method depends on grid type: - For lat/lon grids: K-nearest neighbors on sphere using geodesic distance

(controlled by k_neighbors, default 8). This uses coordinate-based spatial relationships rather than grid indices.
- For non-geographic grids: Index-based 4- or 8-connectivity using grid array structure (controlled by neighbor_connectivity). This is not K-nearest neighbors—it connects cells based on their position in the 2D array (horizontal, vertical, and optionally diagonal neighbors in grid index space).
Consensus clusters represent regions whose internal edges are repeatedly co-clustered across the inputs and may be chained via single-link paths.
Large, non-compact clusters can form if consensus is too lenient; increase min_consensus or apply additional filtering for tighter components if needed.
Suitable for identifying robust tipping regions or domains unaffected by clustering noise.

Example

>>> ds, summary_df = td.aggregate.cluster_consensus(
...     cluster_vars=['clust_a', 'clust_b'], min_consensus=0.7
... )
>>> ds.clusters.plot()  # Visualize consensus clusters
>>> summary_df.head()  # View cluster statistics

Raises:

ValueError – If neighbor_connectivity is not 4 or 8.
AssertionError – If no cluster_vars are found.

Parameters:

cluster_vars (List[str] | None)
min_consensus (float)
top_n_clusters (int | None)
neighbor_connectivity (int)
regridder (HealPixRegridder | None)
k_neighbors (int)
show_progress (bool)

Return type:

Tuple[Dataset, DataFrame]

See also

Evidence accumulation clustering (EAC) method from Fred & Jain (2005). This implementation uses spatial adjacency instead of dense all-pairs co-association for scalability.

cluster_consistency(cluster_vars=None)¶

Evaluate the spatial consistency of cluster membership for each grid cell across multiple clustering variables (e.g., from different models).

⚠️ Deprecated: This function is conceptually superseded by cluster_consensus(). The Jaccard-based cluster consistency metric is retained for backwards compatibility but will be removed in a future release. The consistency field returned by cluster_consensus() provides a more efficient and interpretable measure of local co-association across runs.

This function measures how stable the spatial neighborhood of each grid cell’s cluster is across clustering variables, using the Jaccard similarity.

For each grid cell: 1. Identify which cluster it belongs to in each clustering variable. 2. For every pair of clusterings, retrieve the full set of grid cells that were in the same cluster, and compute the Jaccard similarity between these sets. (Jaccard = |A ∩ B| / |A ∪ B|) 3. Average the Jaccard scores over all clustering pairs to obtain a consistency score.

Interpretation: - A score near 1.0 means the cell consistently clusters with the same spatial neighborhood across different clustering setups. - A score near 0.0 means the cell’s cluster context varies substantially. - NaN is returned if the cell is unclustered (noise) in all clustering variables.

Parameters:

td – TOAD object containing clustering results.
cluster_vars (list[str] | None) – Optional list of cluster variable names. If None, uses td.cluster_vars.

Returns:

Stability scores per grid cell, with the same spatial shape: as the input data and values in [0, 1] or NaN.

Return type:

xr.DataArray

cluster_occurrence_rate(cluster_vars=None)¶

Calculate the normalized occurrence rate of points being part of any cluster.

For each point in space, calculates how many times it is part of a cluster (not noise) across different clustering variables, normalized by the total number of clusterings. This is done by checking if each point was ever part of a cluster (cluster label > -1) for each clustering variable, summing these occurrences, and dividing by the total number of clustering variables.

Parameters:: cluster_vars (list[str] | None) – List of clustering variable names to consider. If None, uses all clustering variables in the TOAD object. Each variable should contain cluster labels where -1 indicates noise points and values >= 0 indicate cluster membership.
Returns:: DataArray containing the normalized cluster occurrence rate for each point. Values range from 0 (never in a cluster) to 1 (always in a cluster). The output variable name will be “cluster_occurrence_rate” with a numeric suffix if that name already exists in the dataset.
Return type:: DataArray

Example

If a point is part of a cluster in 2 out of 3 clustering variables, its occurrence rate would be 2/3 ≈ 0.67.

class toad.postprocessing.Stats(toad, var)¶

Bases: object

Interface to access specialized statistics calculators for clusters: time, space, and general metrics.

Used when calling td.stats(var) explicitly; _StatsAccessor in core.py delegates here for td.stats.time etc.

property general¶: Access general statistics for clusters.

property space¶: Access space-related statistics for clusters.

property time¶: Access time-related statistics for clusters.

Modules

`aggregation`
`stats`

toad.clustering.optimizing

toad.postprocessing.aggregation