Skip to content

Two-Sample Testing

Permutation-based hypothesis testing for the null hypothesis H₀: P = Q.

two_sample_test(samples_p, samples_q, *, method='mmd', n_permutations=1000, seed=None, low_memory=None, **kwargs)

Two-sample hypothesis test via permutation.

Tests H0: P = Q against H1: P != Q by computing a test statistic and comparing it to a null distribution obtained by permuting the combined samples.

Parameters:

Name Type Description Default
samples_p ndarray

Samples from distribution P.

required
samples_q ndarray

Samples from distribution Q.

required
method str

Test statistic to use:

  • "mmd": Maximum Mean Discrepancy (default). Good general-purpose choice with strong theoretical properties.
  • "energy": Energy distance. Works well in arbitrary dimensions without kernel bandwidth selection.
  • "kl_knn": kNN KL divergence estimator. Sensitive to density ratio differences.
'mmd'
n_permutations int

Number of permutations for the null distribution (default 1000). Higher values give more precise p-values but take longer.

1000
seed int or None

Random seed for reproducibility.

None
low_memory bool or None

Memory strategy for "energy" and "mmd" methods:

  • None (default): auto-detect. Uses low-memory mode when the NxN distance matrix would exceed ~1 GiB.
  • True: force low-memory mode. Uses Numba JIT kernels to recompute the statistic from scratch each permutation with O(N) memory. Enables N > 50K without exhausting RAM.
  • False: force precomputed matrix. Faster per permutation but requires O(N²) memory.
None
**kwargs Any

Additional arguments passed to the test statistic function. For "mmd": kernel, bandwidth. For "kl_knn": k.

{}

Returns:

Type Description
TestResult

Named tuple with fields:

  • statistic: float — the observed test statistic
  • p_value: float — permutation p-value
  • null_distribution: np.ndarray — null statistics from permutations
Notes

The p-value is computed as:

p = (1 + #{b : T_b >= T_obs}) / (1 + B)

where T_obs is the observed statistic, T_b are the null statistics, and B is the number of permutations. The +1 in numerator and denominator ensures the p-value is never exactly 0 and accounts for the observed statistic itself.

The permutation test is:

  • Exact under H0 (finite-sample valid)
  • Non-parametric (no distributional assumptions)
  • Consistent against all alternatives (for MMD with characteristic kernel)

Examples:

>>> import numpy as np
>>> from divergence import two_sample_test
>>> rng = np.random.default_rng(42)
>>> p = rng.normal(0, 1, 200)
>>> q = rng.normal(1, 1, 200)
>>> result = two_sample_test(p, q, method="energy", n_permutations=500, seed=42)
>>> result.p_value < 0.05
True
References

.. [1] Gretton, A. et al. (2012). "A Kernel Two-Sample Test." JMLR, 13, 723-773. .. [2] Szekely, G. J. & Rizzo, M. L. (2004). "Testing for Equal Distributions in High Dimension." InterStat, 5.