Integral Probability Metrics¶
Sample-based distance measures that do not require density estimation. These operate directly on point clouds and are true metrics (or pseudo-metrics) on the space of probability distributions.
Performance
energy_distance and maximum_mean_discrepancy dispatch to Numba
JIT kernels above small thresholds. For 1D inputs, energy_distance
uses a sort-based O(n log n) closed form once n ≥ 200; the multi-D
streaming kernel takes over at n ≥ 5,000. maximum_mean_discrepancy
switches to the JIT path at n ≥ 500 — much earlier than energy
distance because the vectorized fallback materializes three full
kernel matrices.
two_sample_test(method="mmd") precomputes the RBF kernel matrix
once outside the permutation loop and uses the symmetry identity
S_PQ = (K_total - K_PP - K_QQ) / 2 to avoid one block sum per
permutation. The 1D energy permutation test calls the sort-based
kernel under each permutation and runs n=3000 per group with 200
permutations in well under a second.
energy_distance(samples_p, samples_q)
¶
Compute the energy distance between two sample sets.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
samples_p
|
ndarray
|
Samples from distribution P. Shape |
required |
samples_q
|
ndarray
|
Samples from distribution Q. Shape |
required |
Returns:
| Type | Description |
|---|---|
float
|
The energy distance, a non-negative value. |
Notes
The energy distance is defined as
.. math::
E(P, Q) = 2\,\mathbb{E}\|X - Y\|
- \mathbb{E}\|X - X'\|
- \mathbb{E}\|Y - Y'\|
where :math:X, X' \sim P and :math:Y, Y' \sim Q are independent.
Properties:
- Non-negative: :math:
E(P, Q) \geq 0. - Symmetric: :math:
E(P, Q) = E(Q, P). - :math:
E(P, Q) = 0if and only if :math:P = Q.
For univariate data, Euclidean distances reduce to absolute differences.
Examples:
>>> import numpy as np
>>> from divergence.ipms import energy_distance
>>> rng = np.random.default_rng(42)
>>> p = rng.normal(0, 1, 500)
>>> q = rng.normal(1, 1, 500)
>>> energy_distance(p, q) > 0
True
References
.. [1] G. J. Szekely and M. L. Rizzo, "Testing for equal distributions in high dimension," InterStat, 2004. .. [2] G. J. Szekely and M. L. Rizzo, "Energy statistics: A class of statistics based on distances," Journal of Statistical Planning and Inference, 2013.
wasserstein_distance(samples_p, samples_q, *, p=1)
¶
Compute the p-Wasserstein distance between two 1D sample sets.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
samples_p
|
ndarray
|
1D samples from distribution P. Shape |
required |
samples_q
|
ndarray
|
1D samples from distribution Q. Shape |
required |
p
|
int
|
Order of the Wasserstein distance. Must be >= 1. Default is 1. |
1
|
Returns:
| Type | Description |
|---|---|
float
|
The p-Wasserstein distance, a non-negative value. |
Notes
For p=1, this delegates to :func:scipy.stats.wasserstein_distance.
For p >= 2 the computation proceeds as follows:
- Sort both sample arrays.
- If the sample sizes are equal, pair sorted elements directly:
.. math::
W_p = \left(\frac{1}{n}\sum_{i=1}^{n}
|x_{(i)} - y_{(i)}|^p\right)^{1/p}
- If the sample sizes differ, interpolate both empirical quantile functions onto a common grid of quantile levels and integrate:
.. math::
W_p = \left(\int_0^1 |F_P^{-1}(t) - F_Q^{-1}(t)|^p\,dt
\right)^{1/p}
Properties:
- Non-negative: :math:
W_p \geq 0. - Symmetric: :math:
W_p(P, Q) = W_p(Q, P). - Satisfies the triangle inequality (it is a true metric).
Examples:
>>> import numpy as np
>>> from divergence.ipms import wasserstein_distance
>>> rng = np.random.default_rng(42)
>>> p_samples = rng.normal(0, 1, 1000)
>>> q_samples = rng.normal(2, 1, 1000)
>>> wasserstein_distance(p_samples, q_samples, p=1) > 0
True
References
.. [1] C. Villani, "Optimal Transport: Old and New," Springer, 2008. .. [2] M. Arjovsky, S. Chintala, L. Bottou, "Wasserstein Generative Adversarial Networks," ICML, 2017.
maximum_mean_discrepancy(samples_p, samples_q, *, kernel='rbf', bandwidth=None)
¶
Compute the squared maximum mean discrepancy (MMD\ :sup:2) via the
unbiased U-statistic estimator.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
samples_p
|
ndarray
|
Samples from distribution P. Shape |
required |
samples_q
|
ndarray
|
Samples from distribution Q. Shape |
required |
kernel
|
str
|
Kernel function. Currently only |
'rbf'
|
bandwidth
|
float or None
|
Bandwidth parameter :math: |
None
|
Returns:
| Type | Description |
|---|---|
float
|
The squared MMD (MMD\ :sup: |
Notes
The squared MMD with RBF kernel :math:k(x, y) = \exp(-\|x-y\|^2 /
(2\sigma^2)) is estimated via the unbiased U-statistic:
.. math::
\widehat{\mathrm{MMD}}^2_u =
\frac{1}{m(m-1)}\sum_{i \neq j} k(x_i, x_j)
- \frac{2}{mn}\sum_{i,j} k(x_i, y_j)
+ \frac{1}{n(n-1)}\sum_{i \neq j} k(y_i, y_j)
The median heuristic sets :math:\sigma to the median of all pairwise
distances in :math:\{x_1, \ldots, x_m\} \cup \{y_1, \ldots, y_n\}.
Properties:
- :math:
\mathrm{MMD}^2(P, Q) \geq 0in expectation for characteristic kernels (the unbiased estimator can be slightly negative). - Symmetric: :math:
\mathrm{MMD}(P, Q) = \mathrm{MMD}(Q, P).
Examples:
>>> import numpy as np
>>> from divergence.ipms import maximum_mean_discrepancy
>>> rng = np.random.default_rng(42)
>>> p = rng.normal(0, 1, 500)
>>> q = rng.normal(1, 1, 500)
>>> maximum_mean_discrepancy(p, q) > 0
True
References
.. [1] A. Gretton, K. M. Borgwardt, M. J. Rasch, B. Scholkopf, A. Smola, "A kernel two-sample test," JMLR, 2012.
sliced_wasserstein_distance(samples_p, samples_q, *, n_projections=100, p=2, seed=None)
¶
Compute the sliced Wasserstein distance between two sample sets.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
samples_p
|
ndarray
|
Samples from distribution P. Shape |
required |
samples_q
|
ndarray
|
Samples from distribution Q. Shape |
required |
n_projections
|
int
|
Number of random projections to average over. Ignored for 1D inputs. Default is 100. |
100
|
p
|
int
|
Order of the Wasserstein distance for each 1D projection. Default is 2. |
2
|
seed
|
int or None
|
Random seed for reproducible projection directions. Default is |
None
|
Returns:
| Type | Description |
|---|---|
float
|
The sliced Wasserstein distance, a non-negative value. |
Notes
For d-dimensional samples, the sliced Wasserstein distance is defined as
.. math::
SW_p(P, Q) = \left(\mathbb{E}_{\theta \sim \mathrm{Uniform}(S^{d-1})}
\left[W_p^p(\theta^\top_\# P,\,
\theta^\top_\# Q)\right]\right)^{1/p}
where :math:\theta^\top_\# P denotes the pushforward (projection) of
:math:P onto direction :math:\theta, and the expectation is
approximated by averaging over n_projections random unit vectors.
For 1D input, the sliced Wasserstein distance reduces to the ordinary Wasserstein distance and projections are skipped.
Properties:
- Non-negative: :math:
SW_p \geq 0. - Symmetric: :math:
SW_p(P, Q) = SW_p(Q, P). - :math:
SW_p(P, Q) = 0if and only if :math:P = Q.
Examples:
>>> import numpy as np
>>> from divergence.ipms import sliced_wasserstein_distance
>>> rng = np.random.default_rng(42)
>>> p_samples = rng.normal(0, 1, (500, 3))
>>> q_samples = rng.normal(1, 1, (500, 3))
>>> sliced_wasserstein_distance(p_samples, q_samples, seed=0) > 0
True
References
.. [1] M. Rabin, J. Delon, Y. Gousseau, "Wasserstein Barycenter and Its Application to Texture Mixing," SSVM, 2011. .. [2] S. Kolouri, K. Nadjahi, U. Simsekli, R. Badeau, G. Rohde, "Generalized Sliced Wasserstein Distances," NeurIPS, 2019.