Core API¶

Shesha: Self-consistency Metrics for Representational Stability

Core implementations of Shesha variants for measuring geometric stability of high-dimensional representations.

shesha.core.anchor_stability(X, n_splits=30, n_anchors=100, n_per_split=200, metric='cosine', rank_normalize=True, seed=None, max_samples=1500, n_bootstrap_ci=None, ci=0.95)[source]¶

Anchor-based Shesha: measures stability of distance profiles from fixed anchors.

Selects fixed anchor points, then measures consistency of distance profiles from anchors to random data splits. More robust to sampling variation than pure bootstrap approaches.

Parameters:

X (np.ndarray) – Data matrix of shape (n_samples, n_features).
n_splits (int) – Number of random splits.
n_anchors (int) – Number of fixed anchor points.
n_per_split (int) – Number of samples per split.
metric (str) – Distance metric.
rank_normalize (bool) – If True, rank-normalize distances within each anchor before correlating.
seed (int, optional) – Random seed.
max_samples (int, optional) – Subsample to this many samples if exceeded.
n_bootstrap_ci (int, optional) – If provided, compute bootstrap confidence interval by resampling the input data this many times.
ci (float, default=0.95) – Confidence level for the interval.

Returns:

If n_bootstrap_ci is None: mean correlation of anchor distance profiles. If n_bootstrap_ci is set: dict with keys ‘mean’, ‘ci_low’, ‘ci_high’, ‘std’, ‘n_bootstraps’, ‘ci_level’.

Return type:

float or dict

shesha.core.class_separation_ratio(X, y, n_bootstrap=50, subsample_frac=0.5, metric='euclidean', seed=None, n_bootstrap_ci=None, ci=0.95)[source]¶

Class Separation Ratio: ratio of between-class to within-class distances.

Measures how well-separated classes are in the representation space. Uses bootstrap subsampling for computational efficiency and stability. Related to Fisher’s discriminant ratio but operates in distance space.

Parameters:

X (np.ndarray) – Data matrix of shape (n_samples, n_features).
y (np.ndarray) – Class labels of shape (n_samples,).
n_bootstrap (int) – Number of bootstrap iterations for stability.
subsample_frac (float) – Fraction of samples to use per bootstrap (0.0-1.0).
metric (str) – Distance metric: ‘cosine’ or ‘euclidean’.
seed (int, optional) – Random seed for reproducibility.
n_bootstrap_ci (int, optional) – If provided, compute bootstrap confidence interval by resampling the input data this many times.
ci (float, default=0.95) – Confidence level for the interval.

Returns:

If n_bootstrap_ci is None: mean separation ratio. Range: [0, inf). If n_bootstrap_ci is set: dict with keys ‘mean’, ‘ci_low’, ‘ci_high’, ‘std’, ‘n_bootstraps’, ‘ci_level’.

Return type:

float or dict

Examples

>>> # Well-separated classes
>>> X = np.vstack([np.random.randn(100, 10),
...                np.random.randn(100, 10) + 5])
>>> y = np.array([0]*100 + [1]*100)
>>> ratio = class_separation_ratio(X, y)
>>> print(f"Separation: {ratio:.2f}")  # High value

Notes

Higher values indicate representations where same-class samples are closer together than different-class samples, suggesting good discriminability.

shesha.core.compute_rdm(X, metric='cosine', normalize=True)[source]¶

Compute Representational Dissimilarity Matrix (RDM).

Parameters:

X (np.ndarray) – Data matrix of shape (n_samples, n_features).
metric (str) – Distance metric: ‘cosine’, ‘correlation’, or ‘euclidean’.
normalize (bool) – If True and metric=’cosine’, L2-normalize rows before computing distances.

Returns:

Condensed distance vector (upper triangle of RDM).

Return type:

np.ndarray

shesha.core.feature_split(X, n_splits=30, metric='cosine', seed=None, max_samples=1600, n_bootstrap_ci=None, ci=0.95, return_all_splits=False)[source]¶

Feature-Split Shesha: measures internal geometric consistency.

Partitions feature dimensions into random disjoint halves, computes RDMs on each half, and measures their rank correlation. High values indicate that geometric structure is distributed across features (redundant encoding).

Parameters:

X (np.ndarray or list of np.ndarray) – Data matrix of shape (n_samples, n_features), or a list of such matrices for batch evaluation. When a list is passed, returns a list of results in the same order.
n_splits (int) – Number of random feature partitions to average over.
metric (str) – Distance metric for RDM computation.
seed (int, optional) – Random seed for reproducibility.
max_samples (int, optional) – Subsample to this many samples if exceeded (for efficiency).
n_bootstrap_ci (int, optional) – If provided, compute bootstrap confidence interval by resampling the input data this many times (e.g. 1000 or 10000).
ci (float, default=0.95) – Confidence level for the interval (only used when n_bootstrap_ci is set).
return_all_splits (bool, default=False) – If True, return a dict with the mean score and per-split correlation scores instead of only the mean score.

Returns:

If X is a single array and n_bootstrap_ci is None: mean Spearman correlation in [-1, 1]. If X is a single array and return_all_splits is True: dict with keys ‘mean’ and ‘split_scores’. If X is a single array and n_bootstrap_ci is set: dict with keys ‘mean’, ‘ci_low’, ‘ci_high’, ‘std’, ‘n_bootstraps’, ‘ci_level’. If X is a list: list of the above, one entry per input matrix.

Return type:

float or dict or list

Examples

>>> X = np.random.randn(500, 768)  # 500 samples, 768-dim embeddings
>>> stability = feature_split(X, n_splits=30, seed=320)
>>> print(f"Feature-split stability: {stability:.3f}")

>>> # Batch evaluation across multiple representations
>>> matrices = [np.random.randn(500, 768) for _ in range(5)]
>>> scores = feature_split(matrices, n_splits=30, seed=320)
>>> print(scores)  # list of 5 floats

>>> # With bootstrap confidence interval
>>> result = feature_split(X, n_splits=30, seed=320, n_bootstrap_ci=1000)
>>> print(f"{result['mean']:.3f} [{result['ci_low']:.3f}, {result['ci_high']:.3f}]")

>>> # Return per-split scores for distribution plots
>>> result = feature_split(X, n_splits=30, seed=320, return_all_splits=True)
>>> scores = result["split_scores"]

shesha.core.lda_stability(X, y, n_bootstrap=50, subsample_frac=0.5, seed=None, n_bootstrap_ci=None, ci=0.95)[source]¶

LDA Subspace Stability: consistency of linear discriminant direction.

Measures whether the optimal linear decision boundary is robust to sampling variation. Computes LDA on full dataset and bootstrapped subsamples, then measures alignment of discriminant vectors.

Parameters:

X (np.ndarray) – Data matrix of shape (n_samples, n_features).
y (np.ndarray) – Binary class labels of shape (n_samples,). Must have exactly 2 classes.
n_bootstrap (int) – Number of bootstrap iterations.
subsample_frac (float) – Fraction of samples to use per bootstrap (0.0-1.0).
seed (int, optional) – Random seed for reproducibility.
n_bootstrap_ci (int, optional) – If provided, compute bootstrap confidence interval by resampling the input data this many times.
ci (float, default=0.95) – Confidence level for the interval.

Returns:

If n_bootstrap_ci is None: mean absolute cosine similarity. Range: [0, 1]. If n_bootstrap_ci is set: dict with keys ‘mean’, ‘ci_low’, ‘ci_high’, ‘std’, ‘n_bootstraps’, ‘ci_level’.

Return type:

float or dict

Examples

>>> # Create well-separated binary classification data
>>> X = np.vstack([np.random.randn(100, 10),
...                np.random.randn(100, 10) + 3])
>>> y = np.array([0]*100 + [1]*100)
>>> stability = lda_stability(X, y)
>>> print(f"LDA Stability: {stability:.3f}")  # Should be high

Notes

Low values suggest the discriminant subspace is unstable, potentially indicating overfitting to source domain structure. This metric is particularly useful for predicting transfer learning performance.

Only works for binary classification. For multi-class, consider using class_separation_ratio instead.

shesha.core.rdm_drift(X, Y, method='spearman', metric='cosine', n_bootstrap_ci=None, ci=0.95, seed=None)[source]¶

Compute representational drift between two representations.

Drift is defined as 1 - rdm_similarity, so higher values indicate more change in geometric structure. This is useful for tracking how much a representation has changed over time or due to some intervention (fine-tuning, perturbation, etc.).

Parameters:

X (np.ndarray) – First (baseline/before) representation of shape (n_samples, n_features_x).
Y (np.ndarray) – Second (comparison/after) representation of shape (n_samples, n_features_y). Must have the same number of samples as X.
method (str) – Correlation method: ‘spearman’ (rank-based, default) or ‘pearson’.
metric (str) – Distance metric for RDM computation.
n_bootstrap_ci (int, optional) – If provided, compute bootstrap confidence interval by resampling the input data this many times.
ci (float, default=0.95) – Confidence level for the interval.
seed (int, optional) – Random seed for bootstrap reproducibility.

Returns:

If n_bootstrap_ci is None: drift score. Range: [0, 2]. If n_bootstrap_ci is set: dict with keys ‘mean’, ‘ci_low’, ‘ci_high’, ‘std’, ‘n_bootstraps’, ‘ci_level’.

Return type:

float or dict

Examples

>>> # Track drift during training
>>> X_epoch0 = model.encode(data)
>>> for epoch in range(10):
...     train_one_epoch(model)
...     X_current = model.encode(data)
...     drift = rdm_drift(X_epoch0, X_current)
...     print(f"Epoch {epoch+1}: drift = {drift:.3f}")

>>> # Measure drift due to noise perturbation
>>> X_clean = model.encode(clean_data)
>>> X_noisy = model.encode(noisy_data)
>>> drift = rdm_drift(X_clean, X_noisy)
>>> print(f"Noise-induced drift: {drift:.3f}")

Unified interface¶

shesha.shesha(X, y=None, variant='feature_split', **kwargs)[source]¶

Unified interface for computing Shesha stability metrics.

Parameters:

X (np.ndarray) – Data matrix of shape (n_samples, n_features).
y (np.ndarray, optional) – Class labels (required for supervised variants).
variant (str) – Which Shesha variant to compute: - ‘feature_split’: Unsupervised, partitions features - ‘sample_split’: Unsupervised, bootstrap resampling - ‘anchor’: Unsupervised, anchor-based stability - ‘variance’: Supervised, variance ratio - ‘supervised’: Supervised, RDM alignment
**kwargs – Additional arguments passed to the specific variant function.

Returns:

Shesha stability score.

Return type:

float

Examples

>>> # Unsupervised
>>> stability = shesha(X, variant='feature_split', n_splits=30, seed=320)

>>> # Supervised
>>> alignment = shesha(X, y, variant='supervised')

Unsupervised metrics¶

shesha.feature_split(X, n_splits=30, metric='cosine', seed=None, max_samples=1600, n_bootstrap_ci=None, ci=0.95, return_all_splits=False)[source]¶

Feature-Split Shesha: measures internal geometric consistency.

Parameters:

X (np.ndarray or list of np.ndarray) – Data matrix of shape (n_samples, n_features), or a list of such matrices for batch evaluation. When a list is passed, returns a list of results in the same order.
n_splits (int) – Number of random feature partitions to average over.
metric (str) – Distance metric for RDM computation.
seed (int, optional) – Random seed for reproducibility.
max_samples (int, optional) – Subsample to this many samples if exceeded (for efficiency).
n_bootstrap_ci (int, optional) – If provided, compute bootstrap confidence interval by resampling the input data this many times (e.g. 1000 or 10000).
ci (float, default=0.95) – Confidence level for the interval (only used when n_bootstrap_ci is set).
return_all_splits (bool, default=False) – If True, return a dict with the mean score and per-split correlation scores instead of only the mean score.

Returns:

Return type:

float or dict or list

Examples

>>> X = np.random.randn(500, 768)  # 500 samples, 768-dim embeddings
>>> stability = feature_split(X, n_splits=30, seed=320)
>>> print(f"Feature-split stability: {stability:.3f}")

>>> # Batch evaluation across multiple representations
>>> matrices = [np.random.randn(500, 768) for _ in range(5)]
>>> scores = feature_split(matrices, n_splits=30, seed=320)
>>> print(scores)  # list of 5 floats

>>> # With bootstrap confidence interval
>>> result = feature_split(X, n_splits=30, seed=320, n_bootstrap_ci=1000)
>>> print(f"{result['mean']:.3f} [{result['ci_low']:.3f}, {result['ci_high']:.3f}]")

>>> # Return per-split scores for distribution plots
>>> result = feature_split(X, n_splits=30, seed=320, return_all_splits=True)
>>> scores = result["split_scores"]

shesha.sample_split(X, n_splits=30, subsample_fraction=0.4, metric='cosine', seed=None, max_samples=1500, n_bootstrap_ci=None, ci=0.95, return_all_splits=False)[source]¶

Sample-Split Shesha (Bootstrap RDM): measures robustness to input variation.

Creates random subsamples of data points, computes RDMs on each, and measures their correlation. Assesses whether distance structure generalizes across different subsets of the data.

Parameters:

X (np.ndarray) – Data matrix of shape (n_samples, n_features).
n_splits (int) – Number of bootstrap iterations.
subsample_fraction (float) – Fraction of samples to use in each subsample.
metric (str) – Distance metric for RDM computation.
seed (int, optional) – Random seed for reproducibility.
max_samples (int, optional) – Subsample to this many samples if exceeded.
n_bootstrap_ci (int, optional) – If provided, compute bootstrap confidence interval by resampling the input data this many times.
ci (float, default=0.95) – Confidence level for the interval.
return_all_splits (bool, default=False) – If True, return a dict with the mean score and per-split correlation scores instead of only the mean score.

Returns:

Return type:

float or dict

Examples

>>> X = np.random.randn(1000, 384)
>>> stability = sample_split(X, n_splits=50, seed=320)

>>> result = sample_split(X, n_splits=50, seed=320, return_all_splits=True)
>>> scores = result["split_scores"]

shesha.anchor_stability(X, n_splits=30, n_anchors=100, n_per_split=200, metric='cosine', rank_normalize=True, seed=None, max_samples=1500, n_bootstrap_ci=None, ci=0.95)[source]¶

Anchor-based Shesha: measures stability of distance profiles from fixed anchors.

Selects fixed anchor points, then measures consistency of distance profiles from anchors to random data splits. More robust to sampling variation than pure bootstrap approaches.

Parameters:

X (np.ndarray) – Data matrix of shape (n_samples, n_features).
n_splits (int) – Number of random splits.
n_anchors (int) – Number of fixed anchor points.
n_per_split (int) – Number of samples per split.
metric (str) – Distance metric.
rank_normalize (bool) – If True, rank-normalize distances within each anchor before correlating.
seed (int, optional) – Random seed.
max_samples (int, optional) – Subsample to this many samples if exceeded.
n_bootstrap_ci (int, optional) – If provided, compute bootstrap confidence interval by resampling the input data this many times.
ci (float, default=0.95) – Confidence level for the interval.

Returns:

Return type:

float or dict

Supervised metrics¶

shesha.variance_ratio(X, y, n_bootstrap_ci=None, ci=0.95, seed=None)[source]¶

Variance Ratio Shesha: ratio of between-class to total variance.

A simple, efficient measure of how much geometric structure is explained by class labels. Equivalent to the R-squared of predicting coordinates from class membership.

Parameters:

X (np.ndarray) – Data matrix of shape (n_samples, n_features).
y (np.ndarray) – Class labels of shape (n_samples,).
n_bootstrap_ci (int, optional) – If provided, compute bootstrap confidence interval by resampling the input data this many times.
ci (float, default=0.95) – Confidence level for the interval.
seed (int, optional) – Random seed for bootstrap reproducibility.

Returns:

Return type:

float or dict

Examples

>>> X = np.random.randn(500, 768)
>>> y = np.random.randint(0, 10, 500)
>>> vr = variance_ratio(X, y)

shesha.supervised_alignment(X, y, metric='correlation', seed=None, max_samples=300, n_bootstrap_ci=None, ci=0.95)[source]¶

Supervised RDM Alignment: correlation between model RDM and ideal label RDM.

Measures how well the representation’s distance structure aligns with task-defined similarity (same class = similar, different class = dissimilar).

Parameters:

X (np.ndarray) – Data matrix of shape (n_samples, n_features).
y (np.ndarray) – Class labels of shape (n_samples,).
metric (str) – Distance metric for model RDM.
seed (int, optional) – Random seed for subsampling.
max_samples (int) – Subsample to this many samples (RDM computation is O(n^2)).
n_bootstrap_ci (int, optional) – If provided, compute bootstrap confidence interval by resampling the input data this many times.
ci (float, default=0.95) – Confidence level for the interval.

Returns:

If n_bootstrap_ci is None: Spearman correlation. Range: [-1, 1]. If n_bootstrap_ci is set: dict with keys ‘mean’, ‘ci_low’, ‘ci_high’, ‘std’, ‘n_bootstraps’, ‘ci_level’.

Return type:

float or dict

shesha.class_separation_ratio(X, y, n_bootstrap=50, subsample_frac=0.5, metric='euclidean', seed=None, n_bootstrap_ci=None, ci=0.95)[source]¶

Class Separation Ratio: ratio of between-class to within-class distances.

Parameters:

X (np.ndarray) – Data matrix of shape (n_samples, n_features).
y (np.ndarray) – Class labels of shape (n_samples,).
n_bootstrap (int) – Number of bootstrap iterations for stability.
subsample_frac (float) – Fraction of samples to use per bootstrap (0.0-1.0).
metric (str) – Distance metric: ‘cosine’ or ‘euclidean’.
seed (int, optional) – Random seed for reproducibility.
n_bootstrap_ci (int, optional) – If provided, compute bootstrap confidence interval by resampling the input data this many times.
ci (float, default=0.95) – Confidence level for the interval.

Returns:

If n_bootstrap_ci is None: mean separation ratio. Range: [0, inf). If n_bootstrap_ci is set: dict with keys ‘mean’, ‘ci_low’, ‘ci_high’, ‘std’, ‘n_bootstraps’, ‘ci_level’.

Return type:

float or dict

Examples

>>> # Well-separated classes
>>> X = np.vstack([np.random.randn(100, 10),
...                np.random.randn(100, 10) + 5])
>>> y = np.array([0]*100 + [1]*100)
>>> ratio = class_separation_ratio(X, y)
>>> print(f"Separation: {ratio:.2f}")  # High value

Notes

Higher values indicate representations where same-class samples are closer together than different-class samples, suggesting good discriminability.

shesha.lda_stability(X, y, n_bootstrap=50, subsample_frac=0.5, seed=None, n_bootstrap_ci=None, ci=0.95)[source]¶

LDA Subspace Stability: consistency of linear discriminant direction.

Measures whether the optimal linear decision boundary is robust to sampling variation. Computes LDA on full dataset and bootstrapped subsamples, then measures alignment of discriminant vectors.

Parameters:

X (np.ndarray) – Data matrix of shape (n_samples, n_features).
y (np.ndarray) – Binary class labels of shape (n_samples,). Must have exactly 2 classes.
n_bootstrap (int) – Number of bootstrap iterations.
subsample_frac (float) – Fraction of samples to use per bootstrap (0.0-1.0).
seed (int, optional) – Random seed for reproducibility.
n_bootstrap_ci (int, optional) – If provided, compute bootstrap confidence interval by resampling the input data this many times.
ci (float, default=0.95) – Confidence level for the interval.

Returns:

Return type:

float or dict

Examples

>>> # Create well-separated binary classification data
>>> X = np.vstack([np.random.randn(100, 10),
...                np.random.randn(100, 10) + 3])
>>> y = np.array([0]*100 + [1]*100)
>>> stability = lda_stability(X, y)
>>> print(f"LDA Stability: {stability:.3f}")  # Should be high

Notes

Only works for binary classification. For multi-class, consider using class_separation_ratio instead.

Drift metrics¶

shesha.rdm_similarity(X, Y, method='spearman', metric='cosine', n_bootstrap_ci=None, ci=0.95, seed=None)[source]¶

Compute RDM similarity between two representations.

Measures how similar the pairwise distance structures are between two representations. Useful for measuring representational drift, comparing models, or tracking changes during training.

Parameters:

X (np.ndarray) – First representation matrix of shape (n_samples, n_features_x).
Y (np.ndarray) – Second representation matrix of shape (n_samples, n_features_y). Must have the same number of samples as X.
method (str) – Correlation method: ‘spearman’ (rank-based, default) or ‘pearson’.
metric (str) – Distance metric for RDM computation: ‘cosine’, ‘correlation’, or ‘euclidean’.
n_bootstrap_ci (int, optional) – If provided, compute bootstrap confidence interval by resampling the input data this many times.
ci (float, default=0.95) – Confidence level for the interval.
seed (int, optional) – Random seed for bootstrap reproducibility.

Returns:

If n_bootstrap_ci is None: correlation between RDMs. Range: [-1, 1]. If n_bootstrap_ci is set: dict with keys ‘mean’, ‘ci_low’, ‘ci_high’, ‘std’, ‘n_bootstraps’, ‘ci_level’.

Return type:

float or dict

Examples

>>> # Compare representations before and after training
>>> X_before = model_before.encode(data)
>>> X_after = model_after.encode(data)
>>> similarity = rdm_similarity(X_before, X_after)
>>> print(f"RDM similarity: {similarity:.3f}")

>>> # Compare two different models
>>> X_model1 = model1.encode(data)
>>> X_model2 = model2.encode(data)
>>> similarity = rdm_similarity(X_model1, X_model2, method='pearson')

Notes

Spearman (default) is more robust to outliers and non-linear relationships
Pearson captures linear relationships in distance magnitudes
The representations can have different feature dimensions (only sample count must match)

shesha.rdm_drift(X, Y, method='spearman', metric='cosine', n_bootstrap_ci=None, ci=0.95, seed=None)[source]¶

Compute representational drift between two representations.

Parameters:

X (np.ndarray) – First (baseline/before) representation of shape (n_samples, n_features_x).
Y (np.ndarray) – Second (comparison/after) representation of shape (n_samples, n_features_y). Must have the same number of samples as X.
method (str) – Correlation method: ‘spearman’ (rank-based, default) or ‘pearson’.
metric (str) – Distance metric for RDM computation.
n_bootstrap_ci (int, optional) – If provided, compute bootstrap confidence interval by resampling the input data this many times.
ci (float, default=0.95) – Confidence level for the interval.
seed (int, optional) – Random seed for bootstrap reproducibility.

Returns:

If n_bootstrap_ci is None: drift score. Range: [0, 2]. If n_bootstrap_ci is set: dict with keys ‘mean’, ‘ci_low’, ‘ci_high’, ‘std’, ‘n_bootstraps’, ‘ci_level’.

Return type:

float or dict

Examples

>>> # Track drift during training
>>> X_epoch0 = model.encode(data)
>>> for epoch in range(10):
...     train_one_epoch(model)
...     X_current = model.encode(data)
...     drift = rdm_drift(X_epoch0, X_current)
...     print(f"Epoch {epoch+1}: drift = {drift:.3f}")

>>> # Measure drift due to noise perturbation
>>> X_clean = model.encode(clean_data)
>>> X_noisy = model.encode(noisy_data)
>>> drift = rdm_drift(X_clean, X_noisy)
>>> print(f"Noise-induced drift: {drift:.3f}")

Utilities¶

shesha.compute_rdm(X, metric='cosine', normalize=True)[source]¶

Compute Representational Dissimilarity Matrix (RDM).

Parameters:

X (np.ndarray) – Data matrix of shape (n_samples, n_features).
metric (str) – Distance metric: ‘cosine’, ‘correlation’, or ‘euclidean’.
normalize (bool) – If True and metric=’cosine’, L2-normalize rows before computing distances.

Returns:

Condensed distance vector (upper triangle of RDM).

Return type:

np.ndarray