Biological Perturbation Analysis

shesha.bio provides metrics for single-cell and CRISPR perturbation experiments. It works natively with AnnData objects.

Compute stability

shesha.bio.compute_stability() measures per-perturbation geometric consistency relative to a control population.

from shesha.bio import compute_stability

stability = compute_stability(
    adata_pca,
    perturbation_key='guide_id',
    control_label='NT',
    metric='cosine',
)
print(stability['KLF1'])   # e.g. 0.85

Compute magnitude

shesha.bio.compute_magnitude() measures the average distance of perturbed cells from the centroid of the control population.

from shesha.bio import compute_magnitude

magnitude = compute_magnitude(
    adata_pca,
    perturbation_key='guide_id',
    control_label='NT',
    metric='euclidean',
)
print(magnitude['KLF1'])   # e.g. 2.40

Bootstrap confidence intervals

The low-level functions perturbation_stability() and perturbation_effect_size() support bootstrap CIs via n_bootstrap_ci. Control and perturbed populations are resampled independently. See Bootstrap Confidence Intervals for full details.

from shesha.bio import perturbation_stability

result = perturbation_stability(X_ctrl, X_pert, n_bootstrap_ci=1000, seed=320)
print(f"{result['mean']:.3f} [{result['ci_low']:.3f}, {result['ci_high']:.3f}]")

Split-half reproducibility

shesha.bio.split_half_reproducibility() measures effect-direction reproducibility for each perturbation by repeated 50/50 random cell splits. High values indicate that the perturbation’s directional shift is consistent across independent subsets of cells — a direct assay of biological reproducibility that is distinct from effect magnitude.

from shesha.bio import split_half_reproducibility

repro = split_half_reproducibility(
    adata,
    perturbation_key="perturbation",
    control_label="control",
    n_splits=50,
    random_state=320,
)
# Returns DataFrame: index=perturbation, columns=[split_half_cosine, n_cells]
print(repro.sort_values("split_half_cosine", ascending=False).head())

Magnitude-matched comparison

shesha.bio.magnitude_matched_comparison() tests whether stability predicts reproducibility within magnitude bins, controlling for the SNR confound. Perturbations are binned by effect size and, within each bin, the mean split-half cosine is compared between the high-stability and low-stability halves.

from shesha.bio import compute_stability, compute_magnitude, magnitude_matched_comparison
import pandas as pd

sp = compute_stability(adata, perturbation_key="perturbation", control_label="control")
mp = compute_magnitude(adata, perturbation_key="perturbation", control_label="control")

df = repro.copy()
df["Sp"] = pd.Series(sp)
df["Mp"] = pd.Series(mp)

bins = magnitude_matched_comparison(
    df,
    stability_col="Sp",
    repro_col="split_half_cosine",
    magnitude_col="Mp",
    n_bins=4,
)
print(bins[["mag_bin", "n", "high_stability_mean", "low_stability_mean", "difference"]])

Discordance

shesha.bio.discordance() identifies perturbations that deviate from the expected stability-magnitude relationship. High discordance scores flag perturbations that are less stable than expected given their effect size — candidates for pleiotropic or heterogeneous effects.

Three methods are available:

  • linear (default): OLS residual, sign-flipped and z-scored. Fast and interpretable.

  • rank: rank(Mp) - rank(Sp), z-scored. Non-parametric; robust to outliers.

  • loess: Local regression (LOWESS) residual, sign-flipped and z-scored. Captures nonlinear magnitude-stability trends where the relationship curves at low magnitudes. Requires statsmodels.

from shesha.bio import discordance

# Linear (default)
df["disc_linear"] = discordance(df, stability_col="Sp", magnitude_col="Mp")

# LOESS — captures nonlinear curvature
df["disc_loess"] = discordance(
    df,
    stability_col="Sp",
    magnitude_col="Mp",
    method="loess",
    loess_frac=0.3,
)

# Top 10 most discordant perturbations
print(df.nlargest(10, "disc_loess")[["Sp", "Mp", "disc_loess"]])

The loess_frac parameter controls smoothness: smaller values (e.g. 0.2) follow the data more closely, while larger values (e.g. 0.5) produce smoother expected curves. The default of 0.3 balances sensitivity and stability for typical CRISPR screen sizes.

Note

method='loess' requires statsmodels. Install with pip install statsmodels.