Biological Perturbation Analysis¶
shesha.bio provides metrics for single-cell and CRISPR perturbation experiments.
It works natively with AnnData objects.
Compute stability¶
shesha.bio.compute_stability() measures per-perturbation geometric consistency
relative to a control population.
from shesha.bio import compute_stability
stability = compute_stability(
adata_pca,
perturbation_key='guide_id',
control_label='NT',
metric='cosine',
)
print(stability['KLF1']) # e.g. 0.85
Compute magnitude¶
shesha.bio.compute_magnitude() measures the average distance of perturbed cells
from the centroid of the control population.
from shesha.bio import compute_magnitude
magnitude = compute_magnitude(
adata_pca,
perturbation_key='guide_id',
control_label='NT',
metric='euclidean',
)
print(magnitude['KLF1']) # e.g. 2.40
Bootstrap confidence intervals¶
The low-level functions perturbation_stability() and
perturbation_effect_size() support bootstrap CIs via
n_bootstrap_ci. Control and perturbed populations are resampled independently.
See Bootstrap Confidence Intervals for full details.
from shesha.bio import perturbation_stability
result = perturbation_stability(X_ctrl, X_pert, n_bootstrap_ci=1000, seed=320)
print(f"{result['mean']:.3f} [{result['ci_low']:.3f}, {result['ci_high']:.3f}]")
Split-half reproducibility¶
shesha.bio.split_half_reproducibility() measures effect-direction reproducibility
for each perturbation by repeated 50/50 random cell splits. High values indicate that the
perturbation’s directional shift is consistent across independent subsets of cells — a
direct assay of biological reproducibility that is distinct from effect magnitude.
from shesha.bio import split_half_reproducibility
repro = split_half_reproducibility(
adata,
perturbation_key="perturbation",
control_label="control",
n_splits=50,
random_state=320,
)
# Returns DataFrame: index=perturbation, columns=[split_half_cosine, n_cells]
print(repro.sort_values("split_half_cosine", ascending=False).head())
Magnitude-matched comparison¶
shesha.bio.magnitude_matched_comparison() tests whether stability predicts
reproducibility within magnitude bins, controlling for the SNR confound. Perturbations
are binned by effect size and, within each bin, the mean split-half cosine is compared
between the high-stability and low-stability halves.
from shesha.bio import compute_stability, compute_magnitude, magnitude_matched_comparison
import pandas as pd
sp = compute_stability(adata, perturbation_key="perturbation", control_label="control")
mp = compute_magnitude(adata, perturbation_key="perturbation", control_label="control")
df = repro.copy()
df["Sp"] = pd.Series(sp)
df["Mp"] = pd.Series(mp)
bins = magnitude_matched_comparison(
df,
stability_col="Sp",
repro_col="split_half_cosine",
magnitude_col="Mp",
n_bins=4,
)
print(bins[["mag_bin", "n", "high_stability_mean", "low_stability_mean", "difference"]])
Discordance¶
shesha.bio.discordance() identifies perturbations that deviate from the expected
stability-magnitude relationship. High discordance scores flag perturbations that are
less stable than expected given their effect size — candidates for pleiotropic or
heterogeneous effects.
Three methods are available:
linear (default): OLS residual, sign-flipped and z-scored. Fast and interpretable.
rank: rank(Mp) - rank(Sp), z-scored. Non-parametric; robust to outliers.
loess: Local regression (LOWESS) residual, sign-flipped and z-scored. Captures nonlinear magnitude-stability trends where the relationship curves at low magnitudes. Requires
statsmodels.
from shesha.bio import discordance
# Linear (default)
df["disc_linear"] = discordance(df, stability_col="Sp", magnitude_col="Mp")
# LOESS — captures nonlinear curvature
df["disc_loess"] = discordance(
df,
stability_col="Sp",
magnitude_col="Mp",
method="loess",
loess_frac=0.3,
)
# Top 10 most discordant perturbations
print(df.nlargest(10, "disc_loess")[["Sp", "Mp", "disc_loess"]])
The loess_frac parameter controls smoothness: smaller values (e.g. 0.2) follow the
data more closely, while larger values (e.g. 0.5) produce smoother expected curves.
The default of 0.3 balances sensitivity and stability for typical CRISPR screen sizes.
Note
method='loess' requires statsmodels.
Install with pip install statsmodels.