Bio API

Shesha Bio: Stability metrics for biological perturbation experiments.

This module provides Shesha variants for single-cell and perturbation biology, measuring the consistency of perturbation effects across individual cells.

shesha.bio.compute_magnitude(adata, perturbation_key, control_label='control', metric='euclidean', layer=None)[source]

Scanpy-compatible wrapper for perturbation magnitude.

Parameters:
  • adata (AnnData)

  • perturbation_key (str)

  • control_label (str)

  • metric (str)

  • layer (str | None)

Return type:

dict

shesha.bio.compute_stability(adata, perturbation_key, control_label='control', layer=None, method='standard', **kwargs)[source]

Scanpy-compatible wrapper for perturbation stability.

Computes stability for all perturbations in an AnnData object.

Parameters:
  • adata (AnnData) – Annotated data matrix.

  • perturbation_key (str) – Column in adata.obs containing perturbation labels (e.g. ‘guide_id’).

  • control_label (str) – The label in perturbation_key representing control cells (e.g. ‘NT’).

  • layer (str, optional) – Layer to use for computation. If None, uses .X.

  • method ({'standard', 'whitened', 'knn'}, default='standard') – Method for computing stability: - ‘standard’: Global control centroid - ‘whitened’: Mahalanobis-scaled using control covariance - ‘knn’: Local k-NN matched control centroids

  • **kwargs – Additional arguments passed to perturbation_stability() (e.g., k=50 for knn, regularization=1e-6 for whitened).

Returns:

Dictionary mapping perturbation names to stability scores.

Return type:

dict

Examples

>>> import shesha.bio as bio
>>> # Standard stability
>>> stability = bio.compute_stability(adata, "perturbation")
>>> # Whitened stability
>>> stability_w = bio.compute_stability(adata, "perturbation", method="whitened")
>>> # k-NN stability
>>> stability_knn = bio.compute_stability(adata, "perturbation", method="knn", k=50)
shesha.bio.compute_stability_knn(adata, perturbation_key, control_label='control', layer=None, k=50, metric='euclidean', seed=None, max_samples=1000)[source]

Scanpy-compatible wrapper for k-NN matched control stability.

Convenience wrapper for compute_stability(…, method=’knn’). Consider using the unified interface instead.

Parameters:
  • adata (AnnData) – Annotated data object containing single-cell data.

  • perturbation_key (str) – Column in adata.obs containing perturbation labels.

  • control_label (str, default="control") – Label identifying control/unperturbed cells.

  • layer (str, optional) – Layer in adata.layers to use. If None, uses adata.X.

  • k (int, default=50) – Number of nearest control neighbors to use for local centroid.

  • metric (str, default="euclidean") – Distance metric for k-NN matching: ‘cosine’ or ‘euclidean’.

  • seed (int, optional) – Random seed for subsampling reproducibility.

  • max_samples (int, optional) – Subsample perturbed population if exceeded.

Returns:

Dictionary mapping perturbation names to k-NN matched stability scores.

Return type:

dict

See also

compute_stability

Unified interface with method=’knn’

shesha.bio.compute_stability_whitened(adata, perturbation_key, control_label='control', layer=None, regularization=1e-06, seed=None, max_samples=1000)[source]

Scanpy-compatible wrapper for whitened perturbation stability.

Convenience wrapper for compute_stability(…, method=’whitened’). Consider using the unified interface instead.

Parameters:
  • adata (AnnData) – Annotated data object containing single-cell data.

  • perturbation_key (str) – Column in adata.obs containing perturbation labels.

  • control_label (str, default="control") – Label identifying control/unperturbed cells.

  • layer (str, optional) – Layer in adata.layers to use. If None, uses adata.X.

  • regularization (float, default=1e-6) – Regularization added to covariance diagonal for numerical stability.

  • seed (int, optional) – Random seed for subsampling reproducibility.

  • max_samples (int, optional) – Subsample perturbed population if exceeded.

Returns:

Dictionary mapping perturbation names to whitened stability scores.

Return type:

dict

See also

compute_stability

Unified interface with method=’whitened’

shesha.bio.perturbation_effect_size(X_control, X_perturbed, metric='euclidean', n_bootstrap_ci=None, ci=0.95, seed=None)[source]

Compute the magnitude of the perturbation effect.

Parameters:
  • X_control (np.ndarray) – Control population embeddings.

  • X_perturbed (np.ndarray) – Perturbed population embeddings.

  • metric (str, default="euclidean") –

    • ‘euclidean’: Raw L2 distance between centroids (Magnitude).

      Use this for geometric plots (Stability vs Magnitude).

    • ’cohen’: Standardized effect size (Magnitude / Pooled SD).

      Use this for statistical power analysis.

  • n_bootstrap_ci (int, optional) – If provided, compute bootstrap confidence interval by resampling control and perturbed populations this many times.

  • ci (float, default=0.95) – Confidence level for the interval.

  • seed (int, optional) – Random seed for bootstrap reproducibility.

Returns:

If n_bootstrap_ci is None: the calculated magnitude/effect size. If n_bootstrap_ci is set: dict with keys ‘mean’, ‘ci_low’, ‘ci_high’, ‘std’, ‘n_bootstraps’, ‘ci_level’.

Return type:

float or dict

shesha.bio.perturbation_stability(X_control, X_perturbed, method='standard', metric='cosine', k=50, regularization=1e-06, seed=None, max_samples=1000, n_bootstrap_ci=None, ci=0.95)[source]

Perturbation stability: consistency of perturbation effects across samples.

Measures whether individual perturbed samples shift in a consistent direction relative to the control population. High values indicate that the perturbation has a coherent, reproducible effect; low values suggest heterogeneous or noisy responses.

Parameters:
  • X_control (np.ndarray) – Control population embeddings, shape (n_control, n_features).

  • X_perturbed (np.ndarray) – Perturbed population embeddings, shape (n_perturbed, n_features).

  • method ({'standard', 'whitened', 'knn'}, default='standard') – Method for computing stability: - ‘standard’: Global control centroid (default) - ‘whitened’: Mahalanobis-scaled using control covariance - ‘knn’: Local k-NN matched control centroids

  • metric ({'cosine', 'euclidean'}, default='cosine') – How to measure directional consistency (used for ‘standard’ and ‘knn’ methods).

  • k (int, default=50) – Number of nearest neighbors (only used when method=’knn’).

  • regularization (float, default=1e-6) – Regularization for covariance (only used when method=’whitened’).

  • seed (int, optional) – Random seed for subsampling reproducibility.

  • max_samples (int, optional) – Subsample perturbed population if exceeded.

  • n_bootstrap_ci (int, optional) – If provided, compute bootstrap confidence interval by resampling control and perturbed populations this many times.

  • ci (float, default=0.95) – Confidence level for the interval.

Returns:

If n_bootstrap_ci is None: stability score in [-1, 1]. Higher = more consistent perturbation effect. If n_bootstrap_ci is set: dict with keys ‘mean’, ‘ci_low’, ‘ci_high’, ‘std’, ‘n_bootstraps’, ‘ci_level’.

Return type:

float or dict

Examples

>>> # Control and perturbed cell populations
>>> X_ctrl = np.random.randn(500, 50)  # 500 control cells, 50 genes
>>> shift = np.random.randn(50)  # consistent direction
>>> X_pert = X_ctrl + shift + np.random.randn(500, 50) * 0.1
>>>
>>> # Standard stability
>>> stability = perturbation_stability(X_ctrl, X_pert, method='standard')
>>>
>>> # With bootstrap CI
>>> result = perturbation_stability(X_ctrl, X_pert, n_bootstrap_ci=1000)
>>> print(f"{result['mean']:.3f} [{result['ci_low']:.3f}, {result['ci_high']:.3f}]")

Notes

Method selection: - ‘standard’: Best for homogeneous controls, computationally fastest - ‘whitened’: Better when features have different scales or are correlated - ‘knn’: Best for heterogeneous controls with multiple cell types/states

The control reference is computed differently for each method: - Standard: Global centroid of all control cells - Whitened: Mahalanobis-scaled space accounting for control covariance - k-NN: Local centroid of k nearest control cells for each perturbed cell

shesha.bio.perturbation_stability_knn(X_control, X_perturbed, k=50, metric='euclidean', seed=None, max_samples=1000)[source]

k-NN matched control perturbation stability.

Convenience wrapper for perturbation_stability(…, method=’knn’). Consider using the unified interface instead.

Parameters:
  • X_control (np.ndarray) – Control population embeddings, shape (n_control, n_features).

  • X_perturbed (np.ndarray) – Perturbed population embeddings, shape (n_perturbed, n_features).

  • k (int) – Number of nearest control neighbors to use for local centroid.

  • metric (str) – Distance metric for k-NN matching: ‘cosine’ or ‘euclidean’.

  • seed (int, optional) – Random seed for subsampling reproducibility.

  • max_samples (int, optional) – Subsample perturbed population if exceeded.

Returns:

k-NN matched stability score in [-1, 1].

Return type:

float

See also

perturbation_stability

Unified interface with method=’knn’

shesha.bio.perturbation_stability_whitened(X_control, X_perturbed, regularization=1e-06, seed=None, max_samples=1000)[source]

Whitened (Mahalanobis) perturbation stability.

Convenience wrapper for perturbation_stability(…, method=’whitened’). Consider using the unified interface instead.

Parameters:
  • X_control (np.ndarray) – Control population embeddings, shape (n_control, n_features).

  • X_perturbed (np.ndarray) – Perturbed population embeddings, shape (n_perturbed, n_features).

  • regularization (float) – Regularization added to covariance diagonal for numerical stability.

  • seed (int, optional) – Random seed for subsampling reproducibility.

  • max_samples (int, optional) – Subsample perturbed population if exceeded.

Returns:

Whitened stability score in [-1, 1].

Return type:

float

See also

perturbation_stability

Unified interface with method=’whitened’