Skip to content

Multivariate Dependence

Measures of statistical dependence among multiple variables, generalizing pairwise mutual information.

total_correlation(samples, *, base=np.e, discrete=False, estimator='knn')

Compute the total correlation (multi-information) of a multivariate sample.

.. math::

TC(X_1, \ldots, X_d) = \sum_{i=1}^{d} H(X_i) - H(X_1, \ldots, X_d)

Parameters:

Name Type Description Default
samples ndarray

Sample array of shape (n, d) with d >= 2 variables.

required
base float

Logarithm base. Default is np.e (nats).

e
discrete bool

If True, use discrete estimators. Default is False.

False
estimator str

Estimator for continuous data: "knn" (default) or "kde". Ignored when discrete=True.

'knn'

Returns:

Type Description
float

Total correlation, non-negative.

Raises:

Type Description
ValueError

If samples does not have at least 2 columns.

normalized_mutual_information(samples_x, samples_y, *, normalization='geometric', base=np.e, discrete=False)

Compute normalized mutual information between two variables.

.. math::

\mathrm{NMI}(X, Y) = \frac{I(X; Y)}{\mathrm{norm}(H(X), H(Y))}

Parameters:

Name Type Description Default
samples_x ndarray

Samples of variable X, shape (n,).

required
samples_y ndarray

Samples of variable Y, shape (n,).

required
normalization str or list of str

Normalization method: "geometric" (default), "arithmetic", "max", "min", or "joint". If a list is supplied, the underlying mutual information and entropies are computed once and the function returns a dict mapping each requested normalization to its NMI value — much faster than calling this function once per normalization for the same (samples_x, samples_y).

'geometric'
base float

Logarithm base. Default is np.e.

e
discrete bool

If True, use discrete estimators. Default is False.

False

Returns:

Type Description
float or dict[str, float]

Normalized mutual information. Returns a float when normalization is a string, or a dict mapping each requested normalization name to its NMI when a list is given.

Raises:

Type Description
ValueError

If any requested normalization is unknown.

variation_of_information(samples_x, samples_y, *, base=np.e, discrete=False)

Compute the variation of information between two variables.

.. math::

VI(X, Y) = H(X) + H(Y) - 2\,I(X; Y)

This is a true metric on the space of clusterings/partitions.

Parameters:

Name Type Description Default
samples_x ndarray

Samples of variable X, shape (n,).

required
samples_y ndarray

Samples of variable Y, shape (n,).

required
base float

Logarithm base. Default is np.e.

e
discrete bool

If True, use discrete estimators. Default is False.

False

Returns:

Type Description
float

Variation of information, non-negative.