Bootstrap Confidence Intervals
==============================

Every public metric in Shesha supports an optional **outer bootstrap** for
computing confidence intervals on the point estimate. Instead of returning a
single float, the function returns a dictionary with the mean, lower/upper CI
bounds, standard deviation, and metadata.

How it works
------------

1. The input data is resampled **with replacement** (rows/samples).
2. The metric is recomputed on each resampled dataset.
3. Percentile-based confidence intervals are computed from the distribution of
   bootstrap estimates.

This "outer bootstrap" is independent of any internal iterations the metric
already performs (e.g. ``n_splits`` in ``feature_split``). It quantifies
uncertainty due to the *finite sample* of observations.

Usage
-----

Pass ``n_bootstrap_ci`` (number of resamples) and optionally ``ci`` (confidence
level, default 0.95) to any metric function:

.. code-block:: python

   import shesha

   X = np.random.randn(500, 768)

   # Point estimate (default behaviour, returns float)
   stability = shesha.feature_split(X, n_splits=30, seed=320)

   # With 95% bootstrap CI (returns dict)
   result = shesha.feature_split(X, n_splits=30, seed=320, n_bootstrap_ci=1000)
   print(result)
   # {'mean': 0.42, 'ci_low': 0.38, 'ci_high': 0.46,
   #  'std': 0.021, 'n_bootstraps': 1000, 'ci_level': 0.95}

   # 99% CI
   result_99 = shesha.feature_split(X, n_splits=30, seed=320,
                                     n_bootstrap_ci=1000, ci=0.99)

Return format
-------------

When ``n_bootstrap_ci`` is set, the function returns a dictionary:

.. list-table::
   :widths: 20 80
   :header-rows: 1

   * - Key
     - Description
   * - ``mean``
     - Mean of the bootstrap distribution
   * - ``ci_low``
     - Lower bound of the confidence interval
   * - ``ci_high``
     - Upper bound of the confidence interval
   * - ``std``
     - Standard deviation of bootstrap estimates
   * - ``n_bootstraps``
     - Number of successful resamples (may be < ``n_bootstrap_ci`` if some
       resamples yield NaN)
   * - ``ci_level``
     - The confidence level used (e.g. 0.95)

Examples across modules
-----------------------

**Core (unsupervised)**

.. code-block:: python

   result = shesha.feature_split(X, n_splits=30, n_bootstrap_ci=1000, seed=320)
   result = shesha.sample_split(X, n_splits=30, n_bootstrap_ci=1000, seed=320)
   result = shesha.anchor_stability(X, n_bootstrap_ci=1000, seed=320)

**Core (supervised)**

.. code-block:: python

   result = shesha.variance_ratio(X, y, n_bootstrap_ci=1000, seed=320)
   result = shesha.supervised_alignment(X, y, n_bootstrap_ci=1000, seed=320)
   result = shesha.class_separation_ratio(X, y, n_bootstrap_ci=1000, seed=320)
   result = shesha.lda_stability(X, y, n_bootstrap_ci=1000, seed=320)

**Core (drift)**

.. code-block:: python

   result = shesha.rdm_similarity(X, Y, n_bootstrap_ci=1000, seed=320)
   result = shesha.rdm_drift(X, Y, n_bootstrap_ci=1000, seed=320)

**Bio (perturbation analysis)**

.. code-block:: python

   from shesha.bio import perturbation_stability, perturbation_effect_size

   result = perturbation_stability(X_ctrl, X_pert, n_bootstrap_ci=1000, seed=320)
   result = perturbation_effect_size(X_ctrl, X_pert, n_bootstrap_ci=1000, seed=320)

**Sim (similarity metrics)**

.. code-block:: python

   from shesha.sim import cka, cka_linear, cka_debiased
   from shesha.sim import procrustes_similarity, rdm_similarity

   result = cka(X, Y, n_bootstrap_ci=1000, seed=320)
   result = cka_linear(X, Y, n_bootstrap_ci=1000, seed=320)
   result = cka_debiased(X, Y, n_bootstrap_ci=1000, seed=320)
   result = procrustes_similarity(X, Y, n_bootstrap_ci=1000, seed=320)
   result = rdm_similarity(X, Y, n_bootstrap_ci=1000, seed=320)

Choosing ``n_bootstrap_ci``
---------------------------

- **Quick exploration**: 200–500 resamples
- **Publication-quality**: 1000–10000 resamples
- **Computational cost**: scales linearly with ``n_bootstrap_ci``. Each resample
  runs the full metric computation, so expensive metrics (e.g.
  ``anchor_stability`` on large data) will take proportionally longer.

Resampling strategy
-------------------

- **Single-matrix metrics** (``feature_split``, ``sample_split``, etc.): rows of
  ``X`` (and ``y`` if supervised) are resampled together with the same indices.
- **Two-matrix metrics** (``rdm_similarity``, ``cka``, etc.): both ``X`` and
  ``Y`` are resampled with the **same** indices (paired bootstrap).
- **Bio metrics** (``perturbation_stability``, ``perturbation_effect_size``):
  control and perturbed populations are resampled **independently**.

Reproducibility
---------------

Pass ``seed`` for deterministic results:

.. code-block:: python

   result1 = shesha.feature_split(X, n_bootstrap_ci=1000, seed=320)
   result2 = shesha.feature_split(X, n_bootstrap_ci=1000, seed=320)
   assert result1 == result2  # identical