Two-Sample Testing¶

Permutation-based hypothesis testing for the null hypothesis H₀: P = Q.

`two_sample_test(samples_p, samples_q, *, method='mmd', n_permutations=1000, seed=None, low_memory=None, **kwargs)` ¶

Two-sample hypothesis test via permutation.

Tests H0: P = Q against H1: P != Q by computing a test statistic and comparing it to a null distribution obtained by permuting the combined samples.

Parameters:

Name	Type	Description	Default
`samples_p`	`ndarray`	Samples from distribution P.	required
`samples_q`	`ndarray`	Samples from distribution Q.	required
`method`	`str`	Test statistic to use: `"mmd"`: Maximum Mean Discrepancy (default). Good general-purpose choice with strong theoretical properties. `"energy"`: Energy distance. Works well in arbitrary dimensions without kernel bandwidth selection. `"kl_knn"`: kNN KL divergence estimator. Sensitive to density ratio differences.	`'mmd'`
`n_permutations`	`int`	Number of permutations for the null distribution (default 1000). Higher values give more precise p-values but take longer.	`1000`
`seed`	`int or None`	Random seed for reproducibility.	`None`
`low_memory`	`bool or None`	Memory strategy for `"energy"` and `"mmd"` methods: `None` (default): auto-detect. Uses low-memory mode when the NxN distance matrix would exceed ~1 GiB. `True`: force low-memory mode. Uses Numba JIT kernels to recompute the statistic from scratch each permutation with O(N) memory. Enables N > 50K without exhausting RAM. `False`: force precomputed matrix. Faster per permutation but requires O(N²) memory.	`None`
`**kwargs`	`Any`	Additional arguments passed to the test statistic function. For `"mmd"`: `kernel`, `bandwidth`. For `"kl_knn"`: `k`.	`{}`

Returns:

Type	Description
`TestResult`	Named tuple with fields: `statistic`: float — the observed test statistic `p_value`: float — permutation p-value `null_distribution`: np.ndarray — null statistics from permutations

Notes

The p-value is computed as:

p = (1 + #{b : T_b >= T_obs}) / (1 + B)

where T_obs is the observed statistic, T_b are the null statistics, and B is the number of permutations. The +1 in numerator and denominator ensures the p-value is never exactly 0 and accounts for the observed statistic itself.

The permutation test is:

Exact under H0 (finite-sample valid)
Non-parametric (no distributional assumptions)
Consistent against all alternatives (for MMD with characteristic kernel)

Examples:

>>> import numpy as np
>>> from divergence import two_sample_test
>>> rng = np.random.default_rng(42)
>>> p = rng.normal(0, 1, 200)
>>> q = rng.normal(1, 1, 200)
>>> result = two_sample_test(p, q, method="energy", n_permutations=500, seed=42)
>>> result.p_value < 0.05
True

References

.. [1] Gretton, A. et al. (2012). "A Kernel Two-Sample Test." JMLR, 13, 723-773. .. [2] Szekely, G. J. & Rizzo, M. L. (2004). "Testing for Equal Distributions in High Dimension." InterStat, 5.

Two-Sample Testing¶

two_sample_test(samples_p, samples_q, *, method='mmd', n_permutations=1000, seed=None, low_memory=None, **kwargs) ¶

`two_sample_test(samples_p, samples_q, *, method='mmd', n_permutations=1000, seed=None, low_memory=None, **kwargs)` ¶