Skip to content

kNN Estimators

Non-parametric estimators for information-theoretic quantities based on k-nearest neighbor distances. These avoid explicit density estimation and scale gracefully to moderate and high dimensions.

knn_entropy(samples, *, k=5, base=np.e)

Estimate the differential entropy using the Kozachenko-Leonenko estimator.

Uses k-nearest neighbor distances to estimate the entropy of the distribution from which samples are drawn, without explicit density estimation.

Parameters:

Name Type Description Default
samples ndarray

Sample array of shape (n,) or (n, d) where n is the number of observations and d is the dimensionality.

required
k int

Number of nearest neighbors to use. Default is 5.

5
base float

Base of the logarithm, controlling the unit of measurement. Use np.e for nats (default), 2 for bits, 10 for hartleys.

e

Returns:

Type Description
float

Estimated differential entropy.

Raises:

Type Description
ValueError

If samples has more than 2 dimensions.

Notes

The estimator is given by:

.. math::

\hat{H} = \frac{d}{N} \sum_{i=1}^{N} \ln(2\,\varepsilon_k(i))
          + \ln(N - 1) - \psi(k) + \ln(c_d)

where :math:\varepsilon_k(i) is the Euclidean distance from point i to its k-th nearest neighbor (excluding itself), :math:c_d is the volume of the d-dimensional unit ball, and :math:\psi is the digamma function.

The estimator is consistent and asymptotically unbiased.

Examples:

>>> rng = np.random.default_rng(42)
>>> samples = rng.standard_normal(5000)
>>> h = knn_entropy(samples, k=5)
>>> analytical = 0.5 * (1 + np.log(2 * np.pi))  # ~1.4189 nats
>>> abs(h - analytical) < 0.15
True
References

.. [1] Kozachenko, L. F., & Leonenko, N. N. (1987). "Sample estimate of the entropy of a random vector." Problems of Information Transmission, 23(2), 9-16. .. [2] Kraskov, A., Stogbauer, H., & Grassberger, P. (2004). "Estimating mutual information." Physical Review E, 69(6), 066138.

knn_kl_divergence(samples_p, samples_q, *, k=5, base=np.e)

Estimate the KL divergence D_KL(P || Q) using kNN distances.

Uses the Wang et al. (2009) estimator based on k-nearest neighbor distances from two independent samples.

Parameters:

Name Type Description Default
samples_p ndarray

Samples from distribution P, shape (n,) or (n, d).

required
samples_q ndarray

Samples from distribution Q, shape (m,) or (m, d).

required
k int

Number of nearest neighbors. Default is 5.

5
base float

Base of the logarithm. Default is np.e (nats).

e

Returns:

Type Description
float

Estimated KL divergence D_KL(P || Q).

Notes

The estimator is given by:

.. math::

\hat{D}_{KL}(P \| Q) = \frac{d}{n} \sum_{i=1}^{n}
    \ln\!\left(\frac{\nu_k(i)}{\rho_k(i)}\right)
    + \ln\!\left(\frac{m}{n - 1}\right)

where :math:\rho_k(i) is the distance from :math:x_i to its k-th nearest neighbor in the P-sample (excluding itself), and :math:\nu_k(i) is the distance from :math:x_i to its k-th nearest neighbor in the Q-sample.

The estimator is consistent and converges to the true KL divergence as both sample sizes grow.

Examples:

>>> rng = np.random.default_rng(42)
>>> p = rng.normal(0, 1, size=3000)
>>> q = rng.normal(0, 1, size=3000)
>>> knn_kl_divergence(p, q, k=5)  # Should be close to 0
>>> abs(knn_kl_divergence(p, q, k=5)) < 0.5
True
References

.. [1] Wang, Q., Kulkarni, S. R., & Verdu, S. (2009). "Divergence estimation for multidimensional densities via k-nearest-neighbor distances." IEEE Transactions on Information Theory, 55(5), 2392-2405.

ksg_mutual_information(samples_x, samples_y, *, k=5, base=np.e, algorithm=1)

Estimate mutual information using the Kraskov-Stogbauer-Grassberger estimator.

Uses k-nearest neighbor distances in the joint space with Chebyshev (max-norm) metric to estimate mutual information without explicit density estimation.

Parameters:

Name Type Description Default
samples_x ndarray

Samples of the X variable, shape (n,) or (n, d_x).

required
samples_y ndarray

Samples of the Y variable, shape (n,) or (n, d_y). Must have the same number of observations as samples_x.

required
k int

Number of nearest neighbors. Default is 5.

5
base float

Base of the logarithm. Default is np.e (nats).

e
algorithm int

Which KSG algorithm to use. Must be 1 or 2. Default is 1.

1

Returns:

Type Description
float

Estimated mutual information I(X; Y).

Notes

Algorithm 1:

.. math::

\hat{I}(X; Y) = \psi(k) - \frac{1}{N} \sum_{i=1}^{N}
    \left[\psi(n_x(i) + 1) + \psi(n_y(i) + 1)\right] + \psi(N)

where :math:n_x(i) is the number of points :math:j \neq i with :math:\|X_j - X_i\|_\infty \leq \varepsilon(i), and :math:\varepsilon(i) is the Chebyshev distance to the k-th nearest neighbor in the joint space.

Algorithm 2:

.. math::

\hat{I}(X; Y) = \psi(k) - \frac{1}{k}
    - \frac{1}{N} \sum_{i=1}^{N}
    \left[\psi(m_x(i)) + \psi(m_y(i))\right] + \psi(N)

where :math:m_x(i) and :math:m_y(i) use strict inequality (<) instead of <=.

Both algorithms use the Chebyshev (max-norm, :math:L_\infty) metric for neighbor searches in the joint and marginal spaces.

Mutual information is always non-negative; small negative values may occur due to estimation noise.

Examples:

>>> rng = np.random.default_rng(42)
>>> x = rng.standard_normal(3000)
>>> y = rng.standard_normal(3000)
>>> mi = ksg_mutual_information(x, y, k=5)
>>> abs(mi) < 0.1  # Independent variables, MI should be near 0
True
References

.. [1] Kraskov, A., Stogbauer, H., & Grassberger, P. (2004). "Estimating mutual information." Physical Review E, 69(6), 066138.