toad.shifts.methods.asdetect¶
ASDETECT method for shifts detection.
Contains the ASDETECT algorithm with associated helper functions.
Created: January 22, ? (Sina) Refactored: Nov, 2024 (Jakob)
Functions
|
Compute gradients for segments defined by start indices and lengths. |
|
Construct a detection time series (asdetect algorithm). |
|
Compute the gradients by segmenting the time series in both forward and backward direction, to reduce bias. |
|
Fast linear fit (gradient only) using simple least squares formula. |
|
Numba-compatible median-absolute-deviation function. |
|
Compute MAD given the median (avoids recomputing median). |
|
Numba-compatible median function. |
|
Least squares polynomial fit. |
|
Update the detection time series for a given segment length (two_sided mode). |
Classes
|
Detect abrupt shifts in a time series using gradient-based analysis related to [Boulton+Lenton2019]. |
- class toad.shifts.methods.asdetect.ASDETECT(lmin=5, lmax=None, timescale=None, segmentation='two_sided', ignore_nan_warnings=False)¶
Bases:
ShiftsMethodDetect abrupt shifts in a time series using gradient-based analysis related to [Boulton+Lenton2019].
- The algorithm works by:
Dividing the time series into segments of size l.
Performing linear regression within each segment to calculate gradients.
Identifying significant gradients exceeding ±3 Median Absolute Deviations (MAD) from the median gradient.
Updating a detection array by adding +1 for significant positive gradients and -1 for significant negative gradients in each segment.
Iterating over multiple window sizes (l), updating the detection array at each step.
Normalizing the detection array according to the selected segmentation mode.
Two segmentation modes are available:
- “two_sided” mode (default, recommended):
Uses forward and backward segmentation to reduce bias and improve coverage.
Segments overlap, providing better temporal coverage of the time series.
Normalizes by dividing by a counter array that tracks how many segments cover each time point.
Applies edge correction by downweighting signals at boundaries where fewer segments overlap, reducing unreliable detections at the start and end of the time series.
Generally produces smoother and more reliable results, especially near the edges.
- “original” mode (following [Boulton+Lenton2019]):
Uses centered, non-overlapping segments that are evenly distributed within the time series.
Normalizes by dividing by the total number of window sizes used (lmax - lmin + 1).
Simpler approach that matches the original algorithm implementation.
Note: ASDETECT does not work with NaN values so it will return a detection time series of all zeros if the input time series contains NaN values.
- Parameters:
lmin (int) – The minimum segment length (in time steps) used for detection. Controls the shortest resolvable timescale and must be at least 3. Defaults to 5.
lmax (int | None) – Optional maximum segment length (in time steps) used for detection. To resolve multiple shifts within the same grid cell, this should be chosen smaller than the minimum temporal separation between shifts of interest. If not specified, it defaults to one third of the length of the time dimension.
timescale (tuple[float | None, float | None] | None) –
Optional tuple specifying the minimum and maximum time window sizes for shift detection in the same physical units as the input time axis (e.g. years, days). When provided, this is converted internally to integer segment lengths (lmin, lmax) and overrides the explicit lmin/lmax settings. Either bound may be set to None to fall back to the defaults (5 time steps for the minimum, one third of the series length for the maximum).
Important: When the input dataset uses cftime format, the timescale parameter must be specified in seconds, because cftime values are internally converted to seconds since the first time point. For example, to detect shifts at timescales of 1-2 years with cftime data, use
timescale=(365.25*24*3600, 2*365.25*24*3600)(seconds). For numeric time data (not cftime), use the same units as your time axis.Note: The timescale conversion requires regular temporal spacing. If your time series has irregular spacing, a ValueError will be raised. In this case, use lmin/lmax parameters directly instead of timescale, or resample your data to regular spacing.
segmentation (Literal['two_sided', 'original'] | str) – Segmentation method to use. “two_sided” (recommended) applies forward and backward segmentation with counter-based normalisation and edge correction, reducing positional bias at roughly twice the computational cost of the original algorithm. “original” reproduces the centred segmentation of the original ASDETECT implementation for backward compatibility. Defaults to “two_sided”.
ignore_nan_warnings (bool) – (Optional) If True, timeseries containing NaN values will be ignored, i.e. a detection time series of all zeros will be returned. If False, an error will be raised.
- LMIN_MIN = 5¶
- fit_predict(values_1d, times_1d)¶
Compute the detection time series for each grid cell in the 3D data array.
- Parameters:
values_1d (ndarray[tuple[int, ...], dtype[float64]]) – 1D array of values
times_1d (ndarray[tuple[int, ...], dtype[float64]]) – 1D array of times
- Returns:
1 indicates that all tested segment lengths detected a significant positive gradient (i.e. exceeding 3 MAD of the median gradient),
-1 indicates that all tested segment lengths detected a significant negative gradient.
Values between -1 and 1 indicate the proportion of segment lengths detecting a significant gradient at that time point.
- Return type:
A 1D array of the same length as values_1d, where each value represents the abrupt shift score for a grid cell at a specific time. The score ranges from -1 to 1
- pre_validation(data_array, td)¶
Validate regular temporal spacing and convert timescale to lmin/lmax if needed.
This method is called once in compute_shifts() before applying the method to all grid cells. It always validates that the time series has regular spacing (required for the algorithm to work correctly) and converts timescale parameters to lmin/lmax if timescale is provided.
- Parameters:
data_array (xr.DataArray) – The masked data array that will be processed
td (TOAD) – The TOAD object containing the dataset and metadata
- Raises:
ValueError – If time spacing is irregular
- Return type:
None
- segmentation: Literal['two_sided', 'original'] | str¶
- toad.shifts.methods.asdetect.compute_gradients_from_segments(values_1d, times_1d, seg_starts, seg_lengths)¶
Compute gradients for segments defined by start indices and lengths.
Loops over segments and computes the gradient using a linear fit (1st degree polynomial).
- Parameters:
values_1d (ndarray[tuple[int, ...], dtype[float64]]) – 1D array of values (e.g., temperature, pressure, etc.)
times_1d (ndarray[tuple[int, ...], dtype[float64]]) – 1D array of time points corresponding to the values
seg_starts (ndarray[tuple[int, ...], dtype[int32]]) – 1D array of starting indices for each segment
seg_lengths (ndarray[tuple[int, ...], dtype[int32]]) – 1D array of lengths for each segment (or 1-element array if all same length)
- Returns:
1D array of gradients for each segment
- Return type:
ndarray[tuple[int, …], dtype[float64]]
- toad.shifts.methods.asdetect.construct_detection_ts(values_1d, times_1d, lmin=5, lmax=None, segmentation='two_sided', ignore_nan_warnings=False)¶
Construct a detection time series (asdetect algorithm).
Following [Boulton+Lenton2019], the time series (ts) is divided into segments of length l, for each of which the gradient is computed. Segments with gradients > 3 MAD of the gradients distribution are marked. Averaging over many segmentation choices (i.e. values of l) results in a detection time series that indicates the points of largest relative gradients.
Two segmentation modes are available: - “two_sided”: Uses forward and backward segmentation to reduce bias, with counter-based
normalization. Applies edge correction to downweight signals at boundaries.
“original”: Uses the original centered segmentation algorithm from [Boulton+Lenton2019]. Segments are non-overlapping and centered within the time series.
- >> Args:
- values_1d:
Time series, shape (n,)
- times_1d:
Times, shape (n,), same length as values_1d
- lmin:
Smallest segment length, default = 5
- lmax:
Largest segment length, default = n/3
- segmentation:
Segmentation method to use. Options are “original” (classic) and “two_sided” (removes bias + smoother with edge correction). Defaults to “two_sided”.
- ignore_nan_warnings:
If True, timeseries containing NaN values will be ignored, i.e. a detection time series of all zeros will be returned. If False, an error will be raised.
- >> Returns:
Abrupt shift score time series, shape (n,)
- Parameters:
values_1d (ndarray[tuple[int, ...], dtype[float64]])
times_1d (ndarray[tuple[int, ...], dtype[float64]])
lmin (int)
lmax (int | None)
segmentation (Literal['two_sided', 'original'] | str)
ignore_nan_warnings (bool)
- Return type:
ndarray[tuple[int, …], dtype[float64]]
- toad.shifts.methods.asdetect.get_gradients_double(values_1d, times_1d, segment_length)¶
Compute the gradients by segmenting the time series in both forward and backward direction, to reduce bias.
- Parameters:
values_1d (ndarray[tuple[int, ...], dtype[float64]]) – 1D array of values (e.g., temperature, pressure, etc.)
times_1d (ndarray[tuple[int, ...], dtype[float64]]) – 1D array of time points corresponding to the values
segment_length (int) – Length of the segments to divide the time series into.
- Returns:
1D array of gradients for each segment ix0_arr: 1D array of starting indices for each segment
- Return type:
gradients
- toad.shifts.methods.asdetect.linear_fit(x, y)¶
Fast linear fit (gradient only) using simple least squares formula.
Optimized for deg=1 case. Returns only the gradient (slope) coefficient. Uses efficient formula: slope = (n*sum(xy) - sum(x)*sum(y)) / (n*sum(x²) - sum(x)²)
- Parameters:
x (ndarray[tuple[int, ...], dtype[float64]]) – 1D array of x-coordinates (independent variable)
y (ndarray[tuple[int, ...], dtype[float64]]) – 1D array of y-coordinates (dependent variable)
- Returns:
Gradient (slope) coefficient
- Return type:
float64
- toad.shifts.methods.asdetect.mad(x)¶
Numba-compatible median-absolute-deviation function.
Computes the median absolute deviation of the input array x.
- Parameters:
x (ndarray[tuple[int, ...], dtype[float64]]) – 1D array of values (e.g., gradients)
- Returns:
The median absolute deviation of the input array x
- Return type:
ndarray[tuple[int, …], dtype[float64]]
- toad.shifts.methods.asdetect.mad_from_median(x, x_median)¶
Compute MAD given the median (avoids recomputing median).
- Parameters:
x (ndarray[tuple[int, ...], dtype[float64]]) – 1D array of values
x_median (ndarray[tuple[int, ...], dtype[float64]]) – Pre-computed median of x
- Returns:
The median absolute deviation of x
- Return type:
ndarray[tuple[int, …], dtype[float64]]
- toad.shifts.methods.asdetect.median(x)¶
Numba-compatible median function.
Computes the median of the input array x.
- Parameters:
x (ndarray[tuple[int, ...], dtype[float64]]) – 1D array of values (e.g., gradients)
- Returns:
The median of the input array x
- Return type:
ndarray[tuple[int, …], dtype[float64]]
- toad.shifts.methods.asdetect.polyfit(x, y, deg)¶
Least squares polynomial fit.
This function is a stripped-down version of numpy.polyfit, to make it compatible with numba. It computes the least squares polynomial fit for the given data points (x, y) of degree deg. The function returns the coefficients of the polynomial.
- Parameters:
x (ndarray[tuple[int, ...], dtype[float64]]) – 1D array of x-coordinates (independent variable)
y (ndarray[tuple[int, ...], dtype[float64]]) – 1D or 2D array of y-coordinates (dependent variable)
deg (int) – Degree of the polynomial to fit (0 <= deg <= 1 for linear fit, 2 for quadratic, etc.)
- Returns:
Array of polynomial coefficients
- Raises:
ValueError – If deg < 0
TypeError – If x is not 1D, x is empty, y is not 1D/2D, or x and y have different lengths
- Return type:
ndarray[tuple[int, …], dtype[float64]]
- toad.shifts.methods.asdetect.update_detection_ts_two_sided(detection_ts, counter, values_1d, times_1d, segment_length)¶
Update the detection time series for a given segment length (two_sided mode).
- Parameters:
detection_ts (ndarray[tuple[int, ...], dtype[float64]]) – Current detection time series
counter (ndarray[tuple[int, ...], dtype[int32]]) – Current normalization counter
values_1d (ndarray[tuple[int, ...], dtype[float64]]) – 1D array of values (e.g., temperature, pressure, etc.)
times_1d (ndarray[tuple[int, ...], dtype[float64]]) – 1D array of time points corresponding to the values
segment_length (int) – Length of the segments to divide the time series into
- Returns:
Updated detection time series and normalization counter
- Return type:
tuple[ndarray[tuple[int, …], dtype[float64]], ndarray[tuple[int, …], dtype[int32]]]