Skip to content

Integral Probability Metrics

Sample-based distance measures that do not require density estimation. These operate directly on point clouds and are true metrics (or pseudo-metrics) on the space of probability distributions.

Performance

energy_distance and maximum_mean_discrepancy dispatch to Numba JIT kernels above small thresholds. For 1D inputs, energy_distance uses a sort-based O(n log n) closed form once n ≥ 200; the multi-D streaming kernel takes over at n ≥ 5,000. maximum_mean_discrepancy switches to the JIT path at n ≥ 500 — much earlier than energy distance because the vectorized fallback materializes three full kernel matrices.

two_sample_test(method="mmd") precomputes the RBF kernel matrix once outside the permutation loop and uses the symmetry identity S_PQ = (K_total - K_PP - K_QQ) / 2 to avoid one block sum per permutation. The 1D energy permutation test calls the sort-based kernel under each permutation and runs n=3000 per group with 200 permutations in well under a second.

energy_distance(samples_p, samples_q)

Compute the energy distance between two sample sets.

Parameters:

Name Type Description Default
samples_p ndarray

Samples from distribution P. Shape (n,) for 1D or (n, d) for d-dimensional data.

required
samples_q ndarray

Samples from distribution Q. Shape (m,) for 1D or (m, d) for d-dimensional data.

required

Returns:

Type Description
float

The energy distance, a non-negative value.

Notes

The energy distance is defined as

.. math::

E(P, Q) = 2\,\mathbb{E}\|X - Y\|
           - \mathbb{E}\|X - X'\|
           - \mathbb{E}\|Y - Y'\|

where :math:X, X' \sim P and :math:Y, Y' \sim Q are independent.

Properties:

  • Non-negative: :math:E(P, Q) \geq 0.
  • Symmetric: :math:E(P, Q) = E(Q, P).
  • :math:E(P, Q) = 0 if and only if :math:P = Q.

For univariate data, Euclidean distances reduce to absolute differences.

Examples:

>>> import numpy as np
>>> from divergence.ipms import energy_distance
>>> rng = np.random.default_rng(42)
>>> p = rng.normal(0, 1, 500)
>>> q = rng.normal(1, 1, 500)
>>> energy_distance(p, q) > 0
True
References

.. [1] G. J. Szekely and M. L. Rizzo, "Testing for equal distributions in high dimension," InterStat, 2004. .. [2] G. J. Szekely and M. L. Rizzo, "Energy statistics: A class of statistics based on distances," Journal of Statistical Planning and Inference, 2013.

wasserstein_distance(samples_p, samples_q, *, p=1)

Compute the p-Wasserstein distance between two 1D sample sets.

Parameters:

Name Type Description Default
samples_p ndarray

1D samples from distribution P. Shape (n,).

required
samples_q ndarray

1D samples from distribution Q. Shape (m,).

required
p int

Order of the Wasserstein distance. Must be >= 1. Default is 1.

1

Returns:

Type Description
float

The p-Wasserstein distance, a non-negative value.

Notes

For p=1, this delegates to :func:scipy.stats.wasserstein_distance.

For p >= 2 the computation proceeds as follows:

  • Sort both sample arrays.
  • If the sample sizes are equal, pair sorted elements directly:

.. math::

  W_p = \left(\frac{1}{n}\sum_{i=1}^{n}
        |x_{(i)} - y_{(i)}|^p\right)^{1/p}
  • If the sample sizes differ, interpolate both empirical quantile functions onto a common grid of quantile levels and integrate:

.. math::

  W_p = \left(\int_0^1 |F_P^{-1}(t) - F_Q^{-1}(t)|^p\,dt
        \right)^{1/p}

Properties:

  • Non-negative: :math:W_p \geq 0.
  • Symmetric: :math:W_p(P, Q) = W_p(Q, P).
  • Satisfies the triangle inequality (it is a true metric).

Examples:

>>> import numpy as np
>>> from divergence.ipms import wasserstein_distance
>>> rng = np.random.default_rng(42)
>>> p_samples = rng.normal(0, 1, 1000)
>>> q_samples = rng.normal(2, 1, 1000)
>>> wasserstein_distance(p_samples, q_samples, p=1) > 0
True
References

.. [1] C. Villani, "Optimal Transport: Old and New," Springer, 2008. .. [2] M. Arjovsky, S. Chintala, L. Bottou, "Wasserstein Generative Adversarial Networks," ICML, 2017.

maximum_mean_discrepancy(samples_p, samples_q, *, kernel='rbf', bandwidth=None)

Compute the squared maximum mean discrepancy (MMD\ :sup:2) via the unbiased U-statistic estimator.

Parameters:

Name Type Description Default
samples_p ndarray

Samples from distribution P. Shape (n,) for 1D or (n, d) for d-dimensional data.

required
samples_q ndarray

Samples from distribution Q. Shape (m,) for 1D or (m, d) for d-dimensional data.

required
kernel str

Kernel function. Currently only "rbf" (radial basis function / Gaussian kernel) is supported. Default is "rbf".

'rbf'
bandwidth float or None

Bandwidth parameter :math:\sigma for the RBF kernel. If None, the median heuristic is used: :math:\sigma is set to the median of all pairwise Euclidean distances in the pooled sample. Default is None.

None

Returns:

Type Description
float

The squared MMD (MMD\ :sup:2). The unbiased U-statistic estimator can produce slightly negative values.

Notes

The squared MMD with RBF kernel :math:k(x, y) = \exp(-\|x-y\|^2 / (2\sigma^2)) is estimated via the unbiased U-statistic:

.. math::

\widehat{\mathrm{MMD}}^2_u =
  \frac{1}{m(m-1)}\sum_{i \neq j} k(x_i, x_j)
- \frac{2}{mn}\sum_{i,j} k(x_i, y_j)
+ \frac{1}{n(n-1)}\sum_{i \neq j} k(y_i, y_j)

The median heuristic sets :math:\sigma to the median of all pairwise distances in :math:\{x_1, \ldots, x_m\} \cup \{y_1, \ldots, y_n\}.

Properties:

  • :math:\mathrm{MMD}^2(P, Q) \geq 0 in expectation for characteristic kernels (the unbiased estimator can be slightly negative).
  • Symmetric: :math:\mathrm{MMD}(P, Q) = \mathrm{MMD}(Q, P).

Examples:

>>> import numpy as np
>>> from divergence.ipms import maximum_mean_discrepancy
>>> rng = np.random.default_rng(42)
>>> p = rng.normal(0, 1, 500)
>>> q = rng.normal(1, 1, 500)
>>> maximum_mean_discrepancy(p, q) > 0
True
References

.. [1] A. Gretton, K. M. Borgwardt, M. J. Rasch, B. Scholkopf, A. Smola, "A kernel two-sample test," JMLR, 2012.

sliced_wasserstein_distance(samples_p, samples_q, *, n_projections=100, p=2, seed=None)

Compute the sliced Wasserstein distance between two sample sets.

Parameters:

Name Type Description Default
samples_p ndarray

Samples from distribution P. Shape (n,) for 1D or (n, d) for d-dimensional data.

required
samples_q ndarray

Samples from distribution Q. Shape (m,) for 1D or (m, d) for d-dimensional data.

required
n_projections int

Number of random projections to average over. Ignored for 1D inputs. Default is 100.

100
p int

Order of the Wasserstein distance for each 1D projection. Default is 2.

2
seed int or None

Random seed for reproducible projection directions. Default is None.

None

Returns:

Type Description
float

The sliced Wasserstein distance, a non-negative value.

Notes

For d-dimensional samples, the sliced Wasserstein distance is defined as

.. math::

SW_p(P, Q) = \left(\mathbb{E}_{\theta \sim \mathrm{Uniform}(S^{d-1})}
              \left[W_p^p(\theta^\top_\# P,\,
              \theta^\top_\# Q)\right]\right)^{1/p}

where :math:\theta^\top_\# P denotes the pushforward (projection) of :math:P onto direction :math:\theta, and the expectation is approximated by averaging over n_projections random unit vectors.

For 1D input, the sliced Wasserstein distance reduces to the ordinary Wasserstein distance and projections are skipped.

Properties:

  • Non-negative: :math:SW_p \geq 0.
  • Symmetric: :math:SW_p(P, Q) = SW_p(Q, P).
  • :math:SW_p(P, Q) = 0 if and only if :math:P = Q.

Examples:

>>> import numpy as np
>>> from divergence.ipms import sliced_wasserstein_distance
>>> rng = np.random.default_rng(42)
>>> p_samples = rng.normal(0, 1, (500, 3))
>>> q_samples = rng.normal(1, 1, (500, 3))
>>> sliced_wasserstein_distance(p_samples, q_samples, seed=0) > 0
True
References

.. [1] M. Rabin, J. Delon, Y. Gousseau, "Wasserstein Barycenter and Its Application to Texture Mixing," SSVM, 2011. .. [2] S. Kolouri, K. Nadjahi, U. Simsekli, R. Badeau, G. Rohde, "Generalized Sliced Wasserstein Distances," NeurIPS, 2019.