Estimators

The package implements a number of estimators with common API. For general usage instructions and examples, consult the general instructions. Although the API is organized alphabetically, the estimators have been grouped by types in the list of estimators together with relevant literature.

`bmi.estimators.correlation.CCAMutualInformationEstimator (IMutualInformationPointEstimator)`

`init(self, scale=True)` `special`

Initialize self. See help(type(self)) for accurate signature.

`estimate(self, x, y)`

A point estimate of MI(X; Y) from an i.i.d. sample from the \(P(X, Y)\) distribution.

Parameters:

Name	Type	Description	Default
`x`		shape `(n_samples, dim_x)`	required
`y`		shape `(n_samples, dim_y)`	required

Returns:

Type	Description
	mutual information estimate

`parameters(self)`

Returns the parameters of the estimator.

`bmi.estimators.neural._estimators.DonskerVaradhanEstimator (NeuralEstimatorBase)`

`bmi.estimators._histogram.HistogramEstimator (IMutualInformationPointEstimator)`

`init(self, n_bins_x=5, n_bins_y=None, standardize=True)` `special`

Parameters:

Name	Type	Description	Default
`n_bins_x`	`int`	number of bins per each X dimension	`5`
`n_bins_y`	`Optional[int]`	number of bins per each Y dimension. Leave to None to use `n_bins_x`	`None`
`standardize`	`bool`	whether to standardize the data set	`True`

`estimate(self, x, y)`

MI estimate.

`parameters(self)`

Returns the parameters of the estimator.

`bmi.estimators.neural._estimators.InfoNCEEstimator (NeuralEstimatorBase)`

`bmi.estimators._kde.KDEMutualInformationEstimator (IMutualInformationPointEstimator)`

The kernel density mutual information estimator based on

\(I(X; Y) = h(X) + h(Y) - h(X, Y)\),

where \(h(X)\) is the differential entropy \(h(X) = -\mathbb{E}[ \log p(X) ]\).

The logarithm of probability density function \(\log p(X)\) is estimated via a kernel density estimator (KDE) using SciKit-Learn.

Note

This estimator is very sensitive to the choice of the bandwidth and the kernel. We suggest to treat it with caution.

`init(self, kernel_xy='tophat', kernel_x=None, kernel_y=None, bandwidth_xy='scott', bandwidth_x=None, bandwidth_y=None, standardize=True)` `special`

Parameters:

Name	Type	Description	Default
`kernel_xy`	`Literal['gaussian', 'tophat', 'epanechnikov', 'exponential', 'linear', 'cosine']`	kernel to be used for joint distribution PDF \(p_{XY}\) estimation. See SciKit-Learn's `KernelDensity` object for more information.	`'tophat'`
`kernel_x`	`Optional[Literal['gaussian', 'tophat', 'epanechnikov', 'exponential', 'linear', 'cosine']]`	kernel to be used for the :math:`p_X` estimation. If `None` (default), `kernel_xy` will be used.	`None`
`kernel_y`	`Optional[Literal['gaussian', 'tophat', 'epanechnikov', 'exponential', 'linear', 'cosine']]`	similarly to `kernel_x`.	`None`
`bandwidth_xy`	`Union[float, Literal['scott', 'silverman']]`	kernel bandwidth to be used for joint distribution estimation.	`'scott'`
`bandwidth_x`	`Union[float, Literal['scott', 'silverman']]`	kernel bandwidth to be used for \(p_X\) estimation. If set to None (default), then `bandwidth_xy` is used.	`None`
`bandwidth_y`	`Union[float, Literal['scott', 'silverman']]`	similar to `bandwidth_x`	`None`
`standardize`	`bool`	whether to standardize the data points	`True`

`estimate(self, x, y)`

A point estimate of MI(X; Y) from an i.i.d. sample from the \(P(X, Y)\) distribution.

Parameters:

Name	Type	Description	Default
`x`	`Union[numpy.__array_like._SupportsArray[numpy.dtype[Any]], numpy.__nested_sequence._NestedSequence[numpy.__array_like._SupportsArray[numpy.dtype[Any]]], bool, int, float, complex, str, bytes, numpy.__nested_sequence._NestedSequence[Union[bool, int, float, complex, str, bytes]]]`	shape `(n_samples, dim_x)`	required
`y`	`Union[numpy.__array_like._SupportsArray[numpy.dtype[Any]], numpy.__nested_sequence._NestedSequence[numpy.__array_like._SupportsArray[numpy.dtype[Any]]], bool, int, float, complex, str, bytes, numpy.__nested_sequence._NestedSequence[Union[bool, int, float, complex, str, bytes]]]`	shape `(n_samples, dim_y)`	required

Returns:

Type	Description
`float`	mutual information estimate

`estimate_entropies(self, x, y)`

Calculates differential entropies.

Note

Differential entropy is not invariant to standardization. In particular, if you want to estimate differential entropy of the original variables, you should use standardize=False.

`parameters(self)`

Returns the parameters of the estimator.

`bmi.estimators.ksg.KSGEnsembleFirstEstimator (IMutualInformationPointEstimator)`

An implementation of of the neighborhood-based KSG estimator.

We use the first approximation (i.e., equation (8) in the paper) and allow for using different neighborhood sizes. The final estimate is the average of the estimates using different neighborhood sizes.

`init(self, neighborhoods=(5, 10), standardize=True, metric_x='euclidean', metric_y=None, n_jobs=1, chunk_size=10)` `special`

Parameters:

Name	Type	Description	Default
`neighborhoods`	`Sequence[int]`	sequence of positive integers, specifying the size of neighborhood for MI calculation	`(5, 10)`
`standardize`	`bool`	whether to standardize the data before MI calculation, by default true	`True`
`metric_x`	`Literal['euclidean', 'manhattan', 'chebyshev']`	metric on the X space	`'euclidean'`
`metric_y`	`Optional[Literal['euclidean', 'manhattan', 'chebyshev']]`	metric on the Y space. If None, `metric_x` will be used	`None`
`n_jobs`	`int`	number of jobs to be launched to compute distances. Use -1 to use all processors.	`1`
`chunk_size`	`int`	internal batch size, used to speed up the computations while fitting into the memory	`10`

Note

If you use Chebyshev (\(\l_\infty\)) distance for both \(X\) and \(Y\) spaces, KSGChebyshevEstimator may be faster.

`estimate(self, x, y)`

A point estimate of MI(X; Y) from an i.i.d. sample from the \(P(X, Y)\) distribution.

Parameters:

Name	Type	Description	Default
`x`	`Union[numpy.__array_like._SupportsArray[numpy.dtype[Any]], numpy.__nested_sequence._NestedSequence[numpy.__array_like._SupportsArray[numpy.dtype[Any]]], bool, int, float, complex, str, bytes, numpy.__nested_sequence._NestedSequence[Union[bool, int, float, complex, str, bytes]]]`	shape `(n_samples, dim_x)`	required
`y`	`Union[numpy.__array_like._SupportsArray[numpy.dtype[Any]], numpy.__nested_sequence._NestedSequence[numpy.__array_like._SupportsArray[numpy.dtype[Any]]], bool, int, float, complex, str, bytes, numpy.__nested_sequence._NestedSequence[Union[bool, int, float, complex, str, bytes]]]`	shape `(n_samples, dim_y)`	required

Returns:

Type	Description
`float`	mutual information estimate

`parameters(self)`

Returns the parameters of the estimator.

`bmi.estimators.neural._mine_estimator.MINEEstimator (IMutualInformationPointEstimator)`

`trained_critic: Optional[equinox._module.Module]` `property` `readonly`

Returns the critic function from the end of the training.

Note: 1. You need to train the model by estimating mutual information, otherwise None is returned. 2. Note that the critic can have different meaning depending on the function used.

`init(self, batch_size=256, max_n_steps=10000, train_test_split=0.5, test_every_n_steps=250, learning_rate=0.1, hidden_layers=(16, 8), smoothing_alpha=0.9, standardize=True, verbose=True, seed=42)` `special`

Initialize self. See help(type(self)) for accurate signature.

`estimate(self, x, y)`

A point estimate of MI(X; Y) from an i.i.d. sample from the \(P(X, Y)\) distribution.

Parameters:

Name	Type	Description	Default
`x`	`Union[numpy.__array_like._SupportsArray[numpy.dtype[Any]], numpy.__nested_sequence._NestedSequence[numpy.__array_like._SupportsArray[numpy.dtype[Any]]], bool, int, float, complex, str, bytes, numpy.__nested_sequence._NestedSequence[Union[bool, int, float, complex, str, bytes]]]`	shape `(n_samples, dim_x)`	required
`y`	`Union[numpy.__array_like._SupportsArray[numpy.dtype[Any]], numpy.__nested_sequence._NestedSequence[numpy.__array_like._SupportsArray[numpy.dtype[Any]]], bool, int, float, complex, str, bytes, numpy.__nested_sequence._NestedSequence[Union[bool, int, float, complex, str, bytes]]]`	shape `(n_samples, dim_y)`	required

Returns:

Type	Description
`float`	mutual information estimate

`estimate_with_info(self, x, y)`

Allows for reporting additional information about the run.

`parameters(self)`

Returns the parameters of the estimator.

Estimators

bmi.estimators.correlation.CCAMutualInformationEstimator (IMutualInformationPointEstimator)

__init__(self, scale=True) special

estimate(self, x, y)

parameters(self)

bmi.estimators.neural._estimators.DonskerVaradhanEstimator (NeuralEstimatorBase)

bmi.estimators._histogram.HistogramEstimator (IMutualInformationPointEstimator)

__init__(self, n_bins_x=5, n_bins_y=None, standardize=True) special

estimate(self, x, y)

parameters(self)

bmi.estimators.neural._estimators.InfoNCEEstimator (NeuralEstimatorBase)

bmi.estimators._kde.KDEMutualInformationEstimator (IMutualInformationPointEstimator)

__init__(self, kernel_xy='tophat', kernel_x=None, kernel_y=None, bandwidth_xy='scott', bandwidth_x=None, bandwidth_y=None, standardize=True) special

estimate(self, x, y)

estimate_entropies(self, x, y)

parameters(self)

bmi.estimators.ksg.KSGEnsembleFirstEstimator (IMutualInformationPointEstimator)

__init__(self, neighborhoods=(5, 10), standardize=True, metric_x='euclidean', metric_y=None, n_jobs=1, chunk_size=10) special

estimate(self, x, y)

parameters(self)

bmi.estimators.neural._mine_estimator.MINEEstimator (IMutualInformationPointEstimator)

trained_critic: Optional[equinox._module.Module] property readonly

__init__(self, batch_size=256, max_n_steps=10000, train_test_split=0.5, test_every_n_steps=250, learning_rate=0.1, hidden_layers=(16, 8), smoothing_alpha=0.9, standardize=True, verbose=True, seed=42) special

estimate(self, x, y)

estimate_with_info(self, x, y)

parameters(self)

bmi.estimators.neural._estimators.NWJEstimator (NeuralEstimatorBase)

`bmi.estimators.correlation.CCAMutualInformationEstimator (IMutualInformationPointEstimator)`

`init(self, scale=True)` `special`

`estimate(self, x, y)`

`parameters(self)`

`bmi.estimators.neural._estimators.DonskerVaradhanEstimator (NeuralEstimatorBase)`

`bmi.estimators._histogram.HistogramEstimator (IMutualInformationPointEstimator)`

`init(self, n_bins_x=5, n_bins_y=None, standardize=True)` `special`

`estimate(self, x, y)`

`parameters(self)`

`bmi.estimators.neural._estimators.InfoNCEEstimator (NeuralEstimatorBase)`

`bmi.estimators._kde.KDEMutualInformationEstimator (IMutualInformationPointEstimator)`

`init(self, kernel_xy='tophat', kernel_x=None, kernel_y=None, bandwidth_xy='scott', bandwidth_x=None, bandwidth_y=None, standardize=True)` `special`

`estimate(self, x, y)`

`estimate_entropies(self, x, y)`

`parameters(self)`

`bmi.estimators.ksg.KSGEnsembleFirstEstimator (IMutualInformationPointEstimator)`

`init(self, neighborhoods=(5, 10), standardize=True, metric_x='euclidean', metric_y=None, n_jobs=1, chunk_size=10)` `special`

`estimate(self, x, y)`

`parameters(self)`

`bmi.estimators.neural._mine_estimator.MINEEstimator (IMutualInformationPointEstimator)`

`trained_critic: Optional[equinox._module.Module]` `property` `readonly`

`init(self, batch_size=256, max_n_steps=10000, train_test_split=0.5, test_every_n_steps=250, learning_rate=0.1, hidden_layers=(16, 8), smoothing_alpha=0.9, standardize=True, verbose=True, seed=42)` `special`

`estimate(self, x, y)`

`estimate_with_info(self, x, y)`

`parameters(self)`

`bmi.estimators.neural._estimators.NWJEstimator (NeuralEstimatorBase)`