Integral Probability Metrics¶

Sample-based distance measures that do not require density estimation. These operate directly on point clouds and are true metrics (or pseudo-metrics) on the space of probability distributions.

Performance

energy_distance and maximum_mean_discrepancy dispatch to Numba JIT kernels above small thresholds. For 1D inputs, energy_distance uses a sort-based O(n log n) closed form once n ≥ 200; the multi-D streaming kernel takes over at n ≥ 5,000. maximum_mean_discrepancy switches to the JIT path at n ≥ 500 — much earlier than energy distance because the vectorized fallback materializes three full kernel matrices.

two_sample_test(method="mmd") precomputes the RBF kernel matrix once outside the permutation loop and uses the symmetry identity S_PQ = (K_total - K_PP - K_QQ) / 2 to avoid one block sum per permutation. The 1D energy permutation test calls the sort-based kernel under each permutation and runs n=3000 per group with 200 permutations in well under a second.

`energy_distance(samples_p, samples_q)` ¶

Compute the energy distance between two sample sets.

Parameters:

Name	Type	Description	Default
`samples_p`	`ndarray`	Samples from distribution P. Shape `(n,)` for 1D or `(n, d)` for d-dimensional data.	required
`samples_q`	`ndarray`	Samples from distribution Q. Shape `(m,)` for 1D or `(m, d)` for d-dimensional data.	required

Returns:

Type	Description
`float`	The energy distance, a non-negative value.

Notes

The energy distance is defined as

.. math::

E(P, Q) = 2\,\mathbb{E}\|X - Y\|
           - \mathbb{E}\|X - X'\|
           - \mathbb{E}\|Y - Y'\|

where :math:X, X' \sim P and :math:Y, Y' \sim Q are independent.

Properties:

Non-negative: :math:E(P, Q) \geq 0.
Symmetric: :math:E(P, Q) = E(Q, P).
:math:E(P, Q) = 0 if and only if :math:P = Q.

For univariate data, Euclidean distances reduce to absolute differences.

Examples:

>>> import numpy as np
>>> from divergence.ipms import energy_distance
>>> rng = np.random.default_rng(42)
>>> p = rng.normal(0, 1, 500)
>>> q = rng.normal(1, 1, 500)
>>> energy_distance(p, q) > 0
True

References

.. [1] G. J. Szekely and M. L. Rizzo, "Testing for equal distributions in high dimension," InterStat, 2004. .. [2] G. J. Szekely and M. L. Rizzo, "Energy statistics: A class of statistics based on distances," Journal of Statistical Planning and Inference, 2013.

`wasserstein_distance(samples_p, samples_q, *, p=1)` ¶

Compute the p-Wasserstein distance between two 1D sample sets.

Parameters:

Name	Type	Description	Default
`samples_p`	`ndarray`	1D samples from distribution P. Shape `(n,)`.	required
`samples_q`	`ndarray`	1D samples from distribution Q. Shape `(m,)`.	required
`p`	`int`	Order of the Wasserstein distance. Must be >= 1. Default is 1.	`1`

Returns:

Type	Description
`float`	The p-Wasserstein distance, a non-negative value.

Notes

For p=1, this delegates to :func:scipy.stats.wasserstein_distance.

For p >= 2 the computation proceeds as follows:

Sort both sample arrays.
If the sample sizes are equal, pair sorted elements directly:

.. math::

  W_p = \left(\frac{1}{n}\sum_{i=1}^{n}
        |x_{(i)} - y_{(i)}|^p\right)^{1/p}

If the sample sizes differ, interpolate both empirical quantile functions onto a common grid of quantile levels and integrate:

.. math::

  W_p = \left(\int_0^1 |F_P^{-1}(t) - F_Q^{-1}(t)|^p\,dt
        \right)^{1/p}

Properties:

Non-negative: :math:W_p \geq 0.
Symmetric: :math:W_p(P, Q) = W_p(Q, P).
Satisfies the triangle inequality (it is a true metric).

Examples:

>>> import numpy as np
>>> from divergence.ipms import wasserstein_distance
>>> rng = np.random.default_rng(42)
>>> p_samples = rng.normal(0, 1, 1000)
>>> q_samples = rng.normal(2, 1, 1000)
>>> wasserstein_distance(p_samples, q_samples, p=1) > 0
True

References

.. [1] C. Villani, "Optimal Transport: Old and New," Springer, 2008. .. [2] M. Arjovsky, S. Chintala, L. Bottou, "Wasserstein Generative Adversarial Networks," ICML, 2017.

`maximum_mean_discrepancy(samples_p, samples_q, *, kernel='rbf', bandwidth=None)` ¶

Compute the squared maximum mean discrepancy (MMD\ :sup:2) via the unbiased U-statistic estimator.

Parameters:

Name	Type	Description	Default
`samples_p`	`ndarray`	Samples from distribution P. Shape `(n,)` for 1D or `(n, d)` for d-dimensional data.	required
`samples_q`	`ndarray`	Samples from distribution Q. Shape `(m,)` for 1D or `(m, d)` for d-dimensional data.	required
`kernel`	`str`	Kernel function. Currently only `"rbf"` (radial basis function / Gaussian kernel) is supported. Default is `"rbf"`.	`'rbf'`
`bandwidth`	`float or None`	Bandwidth parameter :math:`\sigma` for the RBF kernel. If `None`, the median heuristic is used: :math:`\sigma` is set to the median of all pairwise Euclidean distances in the pooled sample. Default is `None`.	`None`

Returns:

Type	Description
`float`	The squared MMD (MMD\ :sup:`2`). The unbiased U-statistic estimator can produce slightly negative values.

Notes

The squared MMD with RBF kernel :math:k(x, y) = \exp(-\|x-y\|^2 / (2\sigma^2)) is estimated via the unbiased U-statistic:

.. math::

\widehat{\mathrm{MMD}}^2_u =
  \frac{1}{m(m-1)}\sum_{i \neq j} k(x_i, x_j)
- \frac{2}{mn}\sum_{i,j} k(x_i, y_j)
+ \frac{1}{n(n-1)}\sum_{i \neq j} k(y_i, y_j)

The median heuristic sets :math:\sigma to the median of all pairwise distances in :math:\{x_1, \ldots, x_m\} \cup \{y_1, \ldots, y_n\}.

Properties:

:math:\mathrm{MMD}^2(P, Q) \geq 0 in expectation for characteristic kernels (the unbiased estimator can be slightly negative).
Symmetric: :math:\mathrm{MMD}(P, Q) = \mathrm{MMD}(Q, P).

Examples:

>>> import numpy as np
>>> from divergence.ipms import maximum_mean_discrepancy
>>> rng = np.random.default_rng(42)
>>> p = rng.normal(0, 1, 500)
>>> q = rng.normal(1, 1, 500)
>>> maximum_mean_discrepancy(p, q) > 0
True

References

.. [1] A. Gretton, K. M. Borgwardt, M. J. Rasch, B. Scholkopf, A. Smola, "A kernel two-sample test," JMLR, 2012.

`sliced_wasserstein_distance(samples_p, samples_q, *, n_projections=100, p=2, seed=None)` ¶

Compute the sliced Wasserstein distance between two sample sets.

Parameters:

Name	Type	Description	Default
`samples_p`	`ndarray`	Samples from distribution P. Shape `(n,)` for 1D or `(n, d)` for d-dimensional data.	required
`samples_q`	`ndarray`	Samples from distribution Q. Shape `(m,)` for 1D or `(m, d)` for d-dimensional data.	required
`n_projections`	`int`	Number of random projections to average over. Ignored for 1D inputs. Default is 100.	`100`
`p`	`int`	Order of the Wasserstein distance for each 1D projection. Default is 2.	`2`
`seed`	`int or None`	Random seed for reproducible projection directions. Default is `None`.	`None`

Returns:

Type	Description
`float`	The sliced Wasserstein distance, a non-negative value.

Notes

For d-dimensional samples, the sliced Wasserstein distance is defined as

.. math::

SW_p(P, Q) = \left(\mathbb{E}_{\theta \sim \mathrm{Uniform}(S^{d-1})}
              \left[W_p^p(\theta^\top_\# P,\,
              \theta^\top_\# Q)\right]\right)^{1/p}

where :math:\theta^\top_\# P denotes the pushforward (projection) of :math:P onto direction :math:\theta, and the expectation is approximated by averaging over n_projections random unit vectors.

For 1D input, the sliced Wasserstein distance reduces to the ordinary Wasserstein distance and projections are skipped.

Properties:

Non-negative: :math:SW_p \geq 0.
Symmetric: :math:SW_p(P, Q) = SW_p(Q, P).
:math:SW_p(P, Q) = 0 if and only if :math:P = Q.

Examples:

>>> import numpy as np
>>> from divergence.ipms import sliced_wasserstein_distance
>>> rng = np.random.default_rng(42)
>>> p_samples = rng.normal(0, 1, (500, 3))
>>> q_samples = rng.normal(1, 1, (500, 3))
>>> sliced_wasserstein_distance(p_samples, q_samples, seed=0) > 0
True

References

.. [1] M. Rabin, J. Delon, Y. Gousseau, "Wasserstein Barycenter and Its Application to Texture Mixing," SSVM, 2011. .. [2] S. Kolouri, K. Nadjahi, U. Simsekli, R. Badeau, G. Rohde, "Generalized Sliced Wasserstein Distances," NeurIPS, 2019.

Integral Probability Metrics¶

energy_distance(samples_p, samples_q) ¶

wasserstein_distance(samples_p, samples_q, *, p=1) ¶

maximum_mean_discrepancy(samples_p, samples_q, *, kernel='rbf', bandwidth=None) ¶

sliced_wasserstein_distance(samples_p, samples_q, *, n_projections=100, p=2, seed=None) ¶

`energy_distance(samples_p, samples_q)` ¶

`wasserstein_distance(samples_p, samples_q, *, p=1)` ¶

`maximum_mean_discrepancy(samples_p, samples_q, *, kernel='rbf', bandwidth=None)` ¶

`sliced_wasserstein_distance(samples_p, samples_q, *, n_projections=100, p=2, seed=None)` ¶