kNN Estimators¶

Non-parametric estimators for information-theoretic quantities based on k-nearest neighbor distances. These avoid explicit density estimation and scale gracefully to moderate and high dimensions.

`knn_entropy(samples, *, k=5, base=np.e)` ¶

Estimate the differential entropy using the Kozachenko-Leonenko estimator.

Uses k-nearest neighbor distances to estimate the entropy of the distribution from which samples are drawn, without explicit density estimation.

Parameters:

Name	Type	Description	Default
`samples`	`ndarray`	Sample array of shape `(n,)` or `(n, d)` where n is the number of observations and d is the dimensionality.	required
`k`	`int`	Number of nearest neighbors to use. Default is 5.	`5`
`base`	`float`	Base of the logarithm, controlling the unit of measurement. Use `np.e` for nats (default), `2` for bits, `10` for hartleys.	`e`

Returns:

Type	Description
`float`	Estimated differential entropy.

Raises:

Type	Description
`ValueError`	If `samples` has more than 2 dimensions.

Notes

The estimator is given by:

.. math::

\hat{H} = \frac{d}{N} \sum_{i=1}^{N} \ln(2\,\varepsilon_k(i))
          + \ln(N - 1) - \psi(k) + \ln(c_d)

where :math:\varepsilon_k(i) is the Euclidean distance from point i to its k-th nearest neighbor (excluding itself), :math:c_d is the volume of the d-dimensional unit ball, and :math:\psi is the digamma function.

The estimator is consistent and asymptotically unbiased.

Examples:

>>> rng = np.random.default_rng(42)
>>> samples = rng.standard_normal(5000)
>>> h = knn_entropy(samples, k=5)
>>> analytical = 0.5 * (1 + np.log(2 * np.pi))  # ~1.4189 nats
>>> abs(h - analytical) < 0.15
True

References

.. [1] Kozachenko, L. F., & Leonenko, N. N. (1987). "Sample estimate of the entropy of a random vector." Problems of Information Transmission, 23(2), 9-16. .. [2] Kraskov, A., Stogbauer, H., & Grassberger, P. (2004). "Estimating mutual information." Physical Review E, 69(6), 066138.

`knn_kl_divergence(samples_p, samples_q, *, k=5, base=np.e)` ¶

Estimate the KL divergence D_KL(P || Q) using kNN distances.

Uses the Wang et al. (2009) estimator based on k-nearest neighbor distances from two independent samples.

Parameters:

Name	Type	Description	Default
`samples_p`	`ndarray`	Samples from distribution P, shape `(n,)` or `(n, d)`.	required
`samples_q`	`ndarray`	Samples from distribution Q, shape `(m,)` or `(m, d)`.	required
`k`	`int`	Number of nearest neighbors. Default is 5.	`5`
`base`	`float`	Base of the logarithm. Default is `np.e` (nats).	`e`

Returns:

Type	Description
`float`	Estimated KL divergence D_KL(P \|\| Q).

Notes

The estimator is given by:

.. math::

\hat{D}_{KL}(P \| Q) = \frac{d}{n} \sum_{i=1}^{n}
    \ln\!\left(\frac{\nu_k(i)}{\rho_k(i)}\right)
    + \ln\!\left(\frac{m}{n - 1}\right)

where :math:\rho_k(i) is the distance from :math:x_i to its k-th nearest neighbor in the P-sample (excluding itself), and :math:\nu_k(i) is the distance from :math:x_i to its k-th nearest neighbor in the Q-sample.

The estimator is consistent and converges to the true KL divergence as both sample sizes grow.

Examples:

>>> rng = np.random.default_rng(42)
>>> p = rng.normal(0, 1, size=3000)
>>> q = rng.normal(0, 1, size=3000)
>>> knn_kl_divergence(p, q, k=5)  # Should be close to 0
>>> abs(knn_kl_divergence(p, q, k=5)) < 0.5
True

References

.. [1] Wang, Q., Kulkarni, S. R., & Verdu, S. (2009). "Divergence estimation for multidimensional densities via k-nearest-neighbor distances." IEEE Transactions on Information Theory, 55(5), 2392-2405.

`ksg_mutual_information(samples_x, samples_y, *, k=5, base=np.e, algorithm=1)` ¶

Estimate mutual information using the Kraskov-Stogbauer-Grassberger estimator.

Uses k-nearest neighbor distances in the joint space with Chebyshev (max-norm) metric to estimate mutual information without explicit density estimation.

Parameters:

Name	Type	Description	Default
`samples_x`	`ndarray`	Samples of the X variable, shape `(n,)` or `(n, d_x)`.	required
`samples_y`	`ndarray`	Samples of the Y variable, shape `(n,)` or `(n, d_y)`. Must have the same number of observations as `samples_x`.	required
`k`	`int`	Number of nearest neighbors. Default is 5.	`5`
`base`	`float`	Base of the logarithm. Default is `np.e` (nats).	`e`
`algorithm`	`int`	Which KSG algorithm to use. Must be 1 or 2. Default is 1.	`1`

Returns:

Type	Description
`float`	Estimated mutual information I(X; Y).

Notes

Algorithm 1:

.. math::

\hat{I}(X; Y) = \psi(k) - \frac{1}{N} \sum_{i=1}^{N}
    \left[\psi(n_x(i) + 1) + \psi(n_y(i) + 1)\right] + \psi(N)

where :math:n_x(i) is the number of points :math:j \neq i with :math:\|X_j - X_i\|_\infty \leq \varepsilon(i), and :math:\varepsilon(i) is the Chebyshev distance to the k-th nearest neighbor in the joint space.

Algorithm 2:

.. math::

\hat{I}(X; Y) = \psi(k) - \frac{1}{k}
    - \frac{1}{N} \sum_{i=1}^{N}
    \left[\psi(m_x(i)) + \psi(m_y(i))\right] + \psi(N)

where :math:m_x(i) and :math:m_y(i) use strict inequality (<) instead of <=.

Both algorithms use the Chebyshev (max-norm, :math:L_\infty) metric for neighbor searches in the joint and marginal spaces.

Mutual information is always non-negative; small negative values may occur due to estimation noise.

Examples:

>>> rng = np.random.default_rng(42)
>>> x = rng.standard_normal(3000)
>>> y = rng.standard_normal(3000)
>>> mi = ksg_mutual_information(x, y, k=5)
>>> abs(mi) < 0.1  # Independent variables, MI should be near 0
True

References

.. [1] Kraskov, A., Stogbauer, H., & Grassberger, P. (2004). "Estimating mutual information." Physical Review E, 69(6), 066138.

kNN Estimators¶

knn_entropy(samples, *, k=5, base=np.e) ¶

knn_kl_divergence(samples_p, samples_q, *, k=5, base=np.e) ¶

ksg_mutual_information(samples_x, samples_y, *, k=5, base=np.e, algorithm=1) ¶

`knn_entropy(samples, *, k=5, base=np.e)` ¶

`knn_kl_divergence(samples_p, samples_q, *, k=5, base=np.e)` ¶

`ksg_mutual_information(samples_x, samples_y, *, k=5, base=np.e, algorithm=1)` ¶