Bio API¶

Shesha Bio: Stability metrics for biological perturbation experiments.

This module provides Shesha variants for single-cell and perturbation biology, measuring the consistency of perturbation effects across individual cells.

shesha.bio.compute_magnitude(adata, perturbation_key, control_label='control', metric='euclidean', layer=None)[source]¶

Scanpy-compatible wrapper for perturbation magnitude.

Parameters:

adata (None)
perturbation_key (str)
control_label (str)
metric (str)
layer (str | None)

Return type:

dict

shesha.bio.compute_stability(adata, perturbation_key, control_label='control', layer=None, method='standard', **kwargs)[source]¶

Scanpy-compatible wrapper for perturbation stability.

Computes stability for all perturbations in an AnnData object.

Parameters:

adata (AnnData) – Annotated data matrix.
perturbation_key (str) – Column in adata.obs containing perturbation labels (e.g. ‘guide_id’).
control_label (str) – The label in perturbation_key representing control cells (e.g. ‘NT’).
layer (str, optional) – Layer to use for computation. If None, uses .X.
method ({'standard', 'whitened', 'knn'}, default='standard') – Method for computing stability: - ‘standard’: Global control centroid - ‘whitened’: Mahalanobis-scaled using control covariance - ‘knn’: Local k-NN matched control centroids
**kwargs – Additional arguments passed to perturbation_stability() (e.g., k=50 for knn, regularization=1e-6 for whitened).

Returns:

Dictionary mapping perturbation names to stability scores.

Return type:

dict

Examples

>>> import shesha.bio as bio
>>> # Standard stability
>>> stability = bio.compute_stability(adata, "perturbation")
>>> # Whitened stability
>>> stability_w = bio.compute_stability(adata, "perturbation", method="whitened")
>>> # k-NN stability
>>> stability_knn = bio.compute_stability(adata, "perturbation", method="knn", k=50)

shesha.bio.compute_stability_knn(adata, perturbation_key, control_label='control', layer=None, k=50, metric='euclidean', seed=None, max_samples=1000)[source]¶

Scanpy-compatible wrapper for k-NN matched control stability.

Convenience wrapper for compute_stability(…, method=’knn’). Consider using the unified interface instead.

Parameters:

adata (AnnData) – Annotated data object containing single-cell data.
perturbation_key (str) – Column in adata.obs containing perturbation labels.
control_label (str, default="control") – Label identifying control/unperturbed cells.
layer (str, optional) – Layer in adata.layers to use. If None, uses adata.X.
k (int, default=50) – Number of nearest control neighbors to use for local centroid.
metric (str, default="euclidean") – Distance metric for k-NN matching: ‘cosine’ or ‘euclidean’.
seed (int, optional) – Random seed for subsampling reproducibility.
max_samples (int, optional) – Subsample perturbed population if exceeded.

Returns:

Dictionary mapping perturbation names to k-NN matched stability scores.

Return type:

dict

See also

compute_stability: Unified interface with method=’knn’

shesha.bio.compute_stability_whitened(adata, perturbation_key, control_label='control', layer=None, regularization=1e-06, seed=None, max_samples=1000)[source]¶

Scanpy-compatible wrapper for whitened perturbation stability.

Convenience wrapper for compute_stability(…, method=’whitened’). Consider using the unified interface instead.

Parameters:

adata (AnnData) – Annotated data object containing single-cell data.
perturbation_key (str) – Column in adata.obs containing perturbation labels.
control_label (str, default="control") – Label identifying control/unperturbed cells.
layer (str, optional) – Layer in adata.layers to use. If None, uses adata.X.
regularization (float, default=1e-6) – Regularization added to covariance diagonal for numerical stability.
seed (int, optional) – Random seed for subsampling reproducibility.
max_samples (int, optional) – Subsample perturbed population if exceeded.

Returns:

Dictionary mapping perturbation names to whitened stability scores.

Return type:

dict

See also

compute_stability: Unified interface with method=’whitened’

shesha.bio.discordance(df, stability_col='Sp', magnitude_col='Mp', method='linear', loess_frac=0.3)[source]¶

Compute discordance scores: how much a perturbation deviates from the expected stability-magnitude relationship.

High discordance (positive values) identifies perturbations that are less stable than expected given their effect size — candidates for pleiotropic or heterogeneous effects.

Parameters:

df (pd.DataFrame) – DataFrame containing at least the columns specified by stability_col and magnitude_col.
stability_col (str, default="Sp") – Column with stability scores.
magnitude_col (str, default="Mp") – Column with magnitude/effect-size scores.
method ({'linear', 'rank', 'loess'}, default='linear') –
How to model the expected stability-magnitude relationship: - ‘linear’: OLS residual, sign-flipped, z-scored. - ‘rank’: rank(Mp) - rank(Sp), z-scored. - ‘loess’: LOESS residual (local regression), sign-flipped, z-scored.

Captures nonlinear magnitude-stability trends.
loess_frac (float, default=0.3) – Fraction of data used for each local regression window (only used when method=’loess’). Smaller values follow the data more closely; larger values produce smoother fits.

Returns:

Z-scored discordance scores indexed like the input DataFrame. Positive = less stable than expected (discordant). Negative = more stable than expected (concordant).

Return type:

pd.Series

Examples

>>> from shesha.bio import discordance
>>> df["disc_linear"] = discordance(df, stability_col="Sp", magnitude_col="Mp")
>>> df["disc_loess"] = discordance(df, method="loess", loess_frac=0.3)
>>> # Top discordant perturbations
>>> df.nlargest(10, "disc_loess")

shesha.bio.magnitude_matched_comparison(repro_df, stability_col='Sp', repro_col='split_half_cosine', magnitude_col='Mp', n_bins=4)[source]¶

Magnitude-matched comparison of high-stability vs low-stability groups.

Bins perturbations by magnitude, then within each bin splits at the stability median to compare reproducibility between the high- and low-stability halves. This controls for the confound that larger-effect perturbations may appear more reproducible simply due to higher SNR.

Parameters:

repro_df (pd.DataFrame) – DataFrame containing at least the columns specified by stability_col, repro_col, and magnitude_col.
stability_col (str, default="Sp") – Column with stability scores.
repro_col (str, default="split_half_cosine") – Column with reproducibility scores (e.g. split-half cosine).
magnitude_col (str, default="Mp") – Column with magnitude/effect-size scores for binning.
n_bins (int, default=4) – Number of magnitude bins (quartiles by default).

Returns:

One row per magnitude bin with columns: mag_bin, n, mag_min, mag_max, high_stability_mean, low_stability_mean, difference, within_bin_rho, within_bin_pvalue.

Return type:

pd.DataFrame

Examples

>>> from shesha.bio import magnitude_matched_comparison
>>> bins = magnitude_matched_comparison(
...     repro_df,
...     stability_col="Sp",
...     repro_col="split_half_cosine",
...     magnitude_col="Mp",
...     n_bins=4,
... )

shesha.bio.perturbation_effect_size(X_control, X_perturbed, metric='euclidean', n_bootstrap_ci=None, ci=0.95, seed=None)[source]¶

Compute the magnitude of the perturbation effect.

Parameters:

X_control (np.ndarray) – Control population embeddings.
X_perturbed (np.ndarray) – Perturbed population embeddings.
metric (str, default="euclidean") –
- ‘euclidean’: Raw L2 distance between centroids (Magnitude).
  Use this for geometric plots (Stability vs Magnitude).
- ’cohen’: Standardized effect size (Magnitude / Pooled SD).
  Use this for statistical power analysis.
n_bootstrap_ci (int, optional) – If provided, compute bootstrap confidence interval by resampling control and perturbed populations this many times.
ci (float, default=0.95) – Confidence level for the interval.
seed (int, optional) – Random seed for bootstrap reproducibility.

Returns:

If n_bootstrap_ci is None: the calculated magnitude/effect size. If n_bootstrap_ci is set: dict with keys ‘mean’, ‘ci_low’, ‘ci_high’, ‘std’, ‘n_bootstraps’, ‘ci_level’.

Return type:

float or dict

shesha.bio.perturbation_stability(X_control, X_perturbed, method='standard', metric='cosine', k=50, regularization=1e-06, seed=None, max_samples=1000, n_bootstrap_ci=None, ci=0.95)[source]¶

Perturbation stability: consistency of perturbation effects across samples.

Measures whether individual perturbed samples shift in a consistent direction relative to the control population. High values indicate that the perturbation has a coherent, reproducible effect; low values suggest heterogeneous or noisy responses.

Parameters:

X_control (np.ndarray) – Control population embeddings, shape (n_control, n_features).
X_perturbed (np.ndarray) – Perturbed population embeddings, shape (n_perturbed, n_features).
method ({'standard', 'whitened', 'knn'}, default='standard') – Method for computing stability: - ‘standard’: Global control centroid (default) - ‘whitened’: Mahalanobis-scaled using control covariance - ‘knn’: Local k-NN matched control centroids
metric ({'cosine', 'euclidean'}, default='cosine') – How to measure directional consistency (used for ‘standard’ and ‘knn’ methods).
k (int, default=50) – Number of nearest neighbors (only used when method=’knn’).
regularization (float, default=1e-6) – Regularization for covariance (only used when method=’whitened’).
seed (int, optional) – Random seed for subsampling reproducibility.
max_samples (int, optional) – Subsample perturbed population if exceeded.
n_bootstrap_ci (int, optional) – If provided, compute bootstrap confidence interval by resampling control and perturbed populations this many times.
ci (float, default=0.95) – Confidence level for the interval.

Returns:

If n_bootstrap_ci is None: stability score in [-1, 1]. Higher = more consistent perturbation effect. If n_bootstrap_ci is set: dict with keys ‘mean’, ‘ci_low’, ‘ci_high’, ‘std’, ‘n_bootstraps’, ‘ci_level’.

Return type:

float or dict

Examples

>>> # Control and perturbed cell populations
>>> X_ctrl = np.random.randn(500, 50)  # 500 control cells, 50 genes
>>> shift = np.random.randn(50)  # consistent direction
>>> X_pert = X_ctrl + shift + np.random.randn(500, 50) * 0.1
>>>
>>> # Standard stability
>>> stability = perturbation_stability(X_ctrl, X_pert, method='standard')
>>>
>>> # With bootstrap CI
>>> result = perturbation_stability(X_ctrl, X_pert, n_bootstrap_ci=1000)
>>> print(f"{result['mean']:.3f} [{result['ci_low']:.3f}, {result['ci_high']:.3f}]")

Notes

Method selection: - ‘standard’: Best for homogeneous controls, computationally fastest - ‘whitened’: Better when features have different scales or are correlated - ‘knn’: Best for heterogeneous controls with multiple cell types/states

The control reference is computed differently for each method: - Standard: Global centroid of all control cells - Whitened: Mahalanobis-scaled space accounting for control covariance - k-NN: Local centroid of k nearest control cells for each perturbed cell

shesha.bio.perturbation_stability_knn(X_control, X_perturbed, k=50, metric='euclidean', seed=None, max_samples=1000)[source]¶

k-NN matched control perturbation stability.

Convenience wrapper for perturbation_stability(…, method=’knn’). Consider using the unified interface instead.

Parameters:

X_control (np.ndarray) – Control population embeddings, shape (n_control, n_features).
X_perturbed (np.ndarray) – Perturbed population embeddings, shape (n_perturbed, n_features).
k (int) – Number of nearest control neighbors to use for local centroid.
metric (str) – Distance metric for k-NN matching: ‘cosine’ or ‘euclidean’.
seed (int, optional) – Random seed for subsampling reproducibility.
max_samples (int, optional) – Subsample perturbed population if exceeded.

Returns:

k-NN matched stability score in [-1, 1].

Return type:

float

See also

perturbation_stability: Unified interface with method=’knn’

shesha.bio.perturbation_stability_whitened(X_control, X_perturbed, regularization=1e-06, seed=None, max_samples=1000)[source]¶

Whitened (Mahalanobis) perturbation stability.

Convenience wrapper for perturbation_stability(…, method=’whitened’). Consider using the unified interface instead.

Parameters:

X_control (np.ndarray) – Control population embeddings, shape (n_control, n_features).
X_perturbed (np.ndarray) – Perturbed population embeddings, shape (n_perturbed, n_features).
regularization (float) – Regularization added to covariance diagonal for numerical stability.
seed (int, optional) – Random seed for subsampling reproducibility.
max_samples (int, optional) – Subsample perturbed population if exceeded.

Returns:

Whitened stability score in [-1, 1].

Return type:

float

See also

perturbation_stability: Unified interface with method=’whitened’

shesha.bio.split_half_reproducibility(adata, perturbation_key='perturbation', control_label='control', n_splits=50, random_state=320, min_cells=30, layer=None)[source]¶

Split-half reproducibility for each perturbation in an AnnData object.

For each perturbation with enough cells, randomly splits cells 50/50, computes independent shift vectors relative to the control centroid, and measures cosine similarity between the halves. This is a direct measure of effect-direction reproducibility: perturbations whose individual cells shift coherently will have high split-half cosine.

Parameters:

adata (AnnData) – Annotated data matrix (cells x features). Operates on adata.X or the specified layer.
perturbation_key (str, default="perturbation") – Column in adata.obs containing perturbation labels.
control_label (str, default="control") – Label identifying control/unperturbed cells.
n_splits (int, default=50) – Number of random 50/50 splits per perturbation.
random_state (int, default=320) – Base random seed. Each perturbation gets a unique derived seed.
min_cells (int, default=30) – Minimum cells required for a perturbation to be included.
layer (str, optional) – Layer in adata.layers to use. If None, uses adata.X.

Returns:

Columns: perturbation, split_half_cosine, n_cells. Indexed by perturbation name.

Return type:

pd.DataFrame

Examples

>>> from shesha.bio import split_half_reproducibility
>>> repro = split_half_reproducibility(
...     adata,
...     perturbation_key="perturbation",
...     control_label="control",
...     n_splits=50,
...     random_state=320,
... )