kNN Estimators¶
Non-parametric estimators for information-theoretic quantities based on k-nearest neighbor distances. These avoid explicit density estimation and scale gracefully to moderate and high dimensions.
knn_entropy(samples, *, k=5, base=np.e)
¶
Estimate the differential entropy using the Kozachenko-Leonenko estimator.
Uses k-nearest neighbor distances to estimate the entropy of the distribution
from which samples are drawn, without explicit density estimation.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
samples
|
ndarray
|
Sample array of shape |
required |
k
|
int
|
Number of nearest neighbors to use. Default is 5. |
5
|
base
|
float
|
Base of the logarithm, controlling the unit of measurement.
Use |
e
|
Returns:
| Type | Description |
|---|---|
float
|
Estimated differential entropy. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If |
Notes
The estimator is given by:
.. math::
\hat{H} = \frac{d}{N} \sum_{i=1}^{N} \ln(2\,\varepsilon_k(i))
+ \ln(N - 1) - \psi(k) + \ln(c_d)
where :math:\varepsilon_k(i) is the Euclidean distance from point i to its
k-th nearest neighbor (excluding itself), :math:c_d is the volume of the
d-dimensional unit ball, and :math:\psi is the digamma function.
The estimator is consistent and asymptotically unbiased.
Examples:
>>> rng = np.random.default_rng(42)
>>> samples = rng.standard_normal(5000)
>>> h = knn_entropy(samples, k=5)
>>> analytical = 0.5 * (1 + np.log(2 * np.pi)) # ~1.4189 nats
>>> abs(h - analytical) < 0.15
True
References
.. [1] Kozachenko, L. F., & Leonenko, N. N. (1987). "Sample estimate of the entropy of a random vector." Problems of Information Transmission, 23(2), 9-16. .. [2] Kraskov, A., Stogbauer, H., & Grassberger, P. (2004). "Estimating mutual information." Physical Review E, 69(6), 066138.
knn_kl_divergence(samples_p, samples_q, *, k=5, base=np.e)
¶
Estimate the KL divergence D_KL(P || Q) using kNN distances.
Uses the Wang et al. (2009) estimator based on k-nearest neighbor distances from two independent samples.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
samples_p
|
ndarray
|
Samples from distribution P, shape |
required |
samples_q
|
ndarray
|
Samples from distribution Q, shape |
required |
k
|
int
|
Number of nearest neighbors. Default is 5. |
5
|
base
|
float
|
Base of the logarithm. Default is |
e
|
Returns:
| Type | Description |
|---|---|
float
|
Estimated KL divergence D_KL(P || Q). |
Notes
The estimator is given by:
.. math::
\hat{D}_{KL}(P \| Q) = \frac{d}{n} \sum_{i=1}^{n}
\ln\!\left(\frac{\nu_k(i)}{\rho_k(i)}\right)
+ \ln\!\left(\frac{m}{n - 1}\right)
where :math:\rho_k(i) is the distance from :math:x_i to its k-th nearest
neighbor in the P-sample (excluding itself), and :math:\nu_k(i) is the distance
from :math:x_i to its k-th nearest neighbor in the Q-sample.
The estimator is consistent and converges to the true KL divergence as both sample sizes grow.
Examples:
>>> rng = np.random.default_rng(42)
>>> p = rng.normal(0, 1, size=3000)
>>> q = rng.normal(0, 1, size=3000)
>>> knn_kl_divergence(p, q, k=5) # Should be close to 0
>>> abs(knn_kl_divergence(p, q, k=5)) < 0.5
True
References
.. [1] Wang, Q., Kulkarni, S. R., & Verdu, S. (2009). "Divergence estimation for multidimensional densities via k-nearest-neighbor distances." IEEE Transactions on Information Theory, 55(5), 2392-2405.
ksg_mutual_information(samples_x, samples_y, *, k=5, base=np.e, algorithm=1)
¶
Estimate mutual information using the Kraskov-Stogbauer-Grassberger estimator.
Uses k-nearest neighbor distances in the joint space with Chebyshev (max-norm) metric to estimate mutual information without explicit density estimation.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
samples_x
|
ndarray
|
Samples of the X variable, shape |
required |
samples_y
|
ndarray
|
Samples of the Y variable, shape |
required |
k
|
int
|
Number of nearest neighbors. Default is 5. |
5
|
base
|
float
|
Base of the logarithm. Default is |
e
|
algorithm
|
int
|
Which KSG algorithm to use. Must be 1 or 2. Default is 1. |
1
|
Returns:
| Type | Description |
|---|---|
float
|
Estimated mutual information I(X; Y). |
Notes
Algorithm 1:
.. math::
\hat{I}(X; Y) = \psi(k) - \frac{1}{N} \sum_{i=1}^{N}
\left[\psi(n_x(i) + 1) + \psi(n_y(i) + 1)\right] + \psi(N)
where :math:n_x(i) is the number of points :math:j \neq i with
:math:\|X_j - X_i\|_\infty \leq \varepsilon(i), and :math:\varepsilon(i)
is the Chebyshev distance to the k-th nearest neighbor in the joint space.
Algorithm 2:
.. math::
\hat{I}(X; Y) = \psi(k) - \frac{1}{k}
- \frac{1}{N} \sum_{i=1}^{N}
\left[\psi(m_x(i)) + \psi(m_y(i))\right] + \psi(N)
where :math:m_x(i) and :math:m_y(i) use strict inequality (<) instead
of <=.
Both algorithms use the Chebyshev (max-norm, :math:L_\infty) metric for
neighbor searches in the joint and marginal spaces.
Mutual information is always non-negative; small negative values may occur due to estimation noise.
Examples:
>>> rng = np.random.default_rng(42)
>>> x = rng.standard_normal(3000)
>>> y = rng.standard_normal(3000)
>>> mi = ksg_mutual_information(x, y, k=5)
>>> abs(mi) < 0.1 # Independent variables, MI should be near 0
True
References
.. [1] Kraskov, A., Stogbauer, H., & Grassberger, P. (2004). "Estimating mutual information." Physical Review E, 69(6), 066138.