Estimators
The package implements a number of estimators with common API. For general usage instructions and examples, consult the general instructions. Although the API is organized alphabetically, the estimators have been grouped by types in the list of estimators together with relevant literature.
bmi.estimators.correlation.CCAMutualInformationEstimator (IMutualInformationPointEstimator)
__init__(self, scale=True)
special
Initialize self. See help(type(self)) for accurate signature.
estimate(self, x, y)
A point estimate of MI(X; Y) from an i.i.d. sample from the \(P(X, Y)\) distribution.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
x |
shape |
required | |
y |
shape |
required |
Returns:
Type | Description |
---|---|
mutual information estimate |
parameters(self)
Returns the parameters of the estimator.
bmi.estimators.neural._estimators.DonskerVaradhanEstimator (NeuralEstimatorBase)
bmi.estimators._histogram.HistogramEstimator (IMutualInformationPointEstimator)
__init__(self, n_bins_x=5, n_bins_y=None, standardize=True)
special
Parameters:
Name | Type | Description | Default |
---|---|---|---|
n_bins_x |
int |
number of bins per each X dimension |
5 |
n_bins_y |
Optional[int] |
number of bins per each Y dimension. Leave to None to use |
None |
standardize |
bool |
whether to standardize the data set |
True |
estimate(self, x, y)
MI estimate.
parameters(self)
Returns the parameters of the estimator.
bmi.estimators.neural._estimators.InfoNCEEstimator (NeuralEstimatorBase)
bmi.estimators._kde.KDEMutualInformationEstimator (IMutualInformationPointEstimator)
The kernel density mutual information estimator based on
\(I(X; Y) = h(X) + h(Y) - h(X, Y)\),
where \(h(X)\) is the differential entropy \(h(X) = -\mathbb{E}[ \log p(X) ]\).
The logarithm of probability density function \(\log p(X)\) is estimated via a kernel density estimator (KDE) using SciKit-Learn.
Note
This estimator is very sensitive to the choice of the bandwidth and the kernel. We suggest to treat it with caution.
__init__(self, kernel_xy='tophat', kernel_x=None, kernel_y=None, bandwidth_xy='scott', bandwidth_x=None, bandwidth_y=None, standardize=True)
special
Parameters:
Name | Type | Description | Default |
---|---|---|---|
kernel_xy |
Literal['gaussian', 'tophat', 'epanechnikov', 'exponential', 'linear', 'cosine'] |
kernel to be used for joint distribution
PDF \(p_{XY}\) estimation.
See SciKit-Learn's |
'tophat' |
kernel_x |
Optional[Literal['gaussian', 'tophat', 'epanechnikov', 'exponential', 'linear', 'cosine']] |
kernel to be used for the :math: |
None |
kernel_y |
Optional[Literal['gaussian', 'tophat', 'epanechnikov', 'exponential', 'linear', 'cosine']] |
similarly to |
None |
bandwidth_xy |
Union[float, Literal['scott', 'silverman']] |
kernel bandwidth to be used for joint distribution estimation. |
'scott' |
bandwidth_x |
Union[float, Literal['scott', 'silverman']] |
kernel bandwidth to be used
for \(p_X\) estimation.
If set to None (default), then |
None |
bandwidth_y |
Union[float, Literal['scott', 'silverman']] |
similar to |
None |
standardize |
bool |
whether to standardize the data points |
True |
estimate(self, x, y)
A point estimate of MI(X; Y) from an i.i.d. sample from the \(P(X, Y)\) distribution.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
x |
Union[numpy.__array_like._SupportsArray[numpy.dtype[Any]], numpy.__nested_sequence._NestedSequence[numpy.__array_like._SupportsArray[numpy.dtype[Any]]], bool, int, float, complex, str, bytes, numpy.__nested_sequence._NestedSequence[Union[bool, int, float, complex, str, bytes]]] |
shape |
required |
y |
Union[numpy.__array_like._SupportsArray[numpy.dtype[Any]], numpy.__nested_sequence._NestedSequence[numpy.__array_like._SupportsArray[numpy.dtype[Any]]], bool, int, float, complex, str, bytes, numpy.__nested_sequence._NestedSequence[Union[bool, int, float, complex, str, bytes]]] |
shape |
required |
Returns:
Type | Description |
---|---|
float |
mutual information estimate |
estimate_entropies(self, x, y)
Calculates differential entropies.
Note
Differential entropy is not invariant to standardization.
In particular, if you want to estimate differential entropy
of the original variables, you should use standardize=False
.
parameters(self)
Returns the parameters of the estimator.
bmi.estimators.ksg.KSGEnsembleFirstEstimator (IMutualInformationPointEstimator)
An implementation of of the neighborhood-based KSG estimator.
We use the first approximation (i.e., equation (8) in the paper) and allow for using different neighborhood sizes. The final estimate is the average of the estimates using different neighborhood sizes.
__init__(self, neighborhoods=(5, 10), standardize=True, metric_x='euclidean', metric_y=None, n_jobs=1, chunk_size=10)
special
Parameters:
Name | Type | Description | Default |
---|---|---|---|
neighborhoods |
Sequence[int] |
sequence of positive integers, specifying the size of neighborhood for MI calculation |
(5, 10) |
standardize |
bool |
whether to standardize the data before MI calculation, by default true |
True |
metric_x |
Literal['euclidean', 'manhattan', 'chebyshev'] |
metric on the X space |
'euclidean' |
metric_y |
Optional[Literal['euclidean', 'manhattan', 'chebyshev']] |
metric on the Y space. If None, |
None |
n_jobs |
int |
number of jobs to be launched to compute distances. Use -1 to use all processors. |
1 |
chunk_size |
int |
internal batch size, used to speed up the computations while fitting into the memory |
10 |
Note
If you use Chebyshev (\(\l_\infty\)) distance for both \(X\) and \(Y\) spaces,
KSGChebyshevEstimator
may be faster.
estimate(self, x, y)
A point estimate of MI(X; Y) from an i.i.d. sample from the \(P(X, Y)\) distribution.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
x |
Union[numpy.__array_like._SupportsArray[numpy.dtype[Any]], numpy.__nested_sequence._NestedSequence[numpy.__array_like._SupportsArray[numpy.dtype[Any]]], bool, int, float, complex, str, bytes, numpy.__nested_sequence._NestedSequence[Union[bool, int, float, complex, str, bytes]]] |
shape |
required |
y |
Union[numpy.__array_like._SupportsArray[numpy.dtype[Any]], numpy.__nested_sequence._NestedSequence[numpy.__array_like._SupportsArray[numpy.dtype[Any]]], bool, int, float, complex, str, bytes, numpy.__nested_sequence._NestedSequence[Union[bool, int, float, complex, str, bytes]]] |
shape |
required |
Returns:
Type | Description |
---|---|
float |
mutual information estimate |
parameters(self)
Returns the parameters of the estimator.
bmi.estimators.neural._mine_estimator.MINEEstimator (IMutualInformationPointEstimator)
trained_critic: Optional[equinox._module.Module]
property
readonly
Returns the critic function from the end of the training.
Note:
1. You need to train the model by estimating mutual information,
otherwise None
is returned.
2. Note that the critic can have different meaning depending on
the function used.
__init__(self, batch_size=256, max_n_steps=10000, train_test_split=0.5, test_every_n_steps=250, learning_rate=0.1, hidden_layers=(16, 8), smoothing_alpha=0.9, standardize=True, verbose=True, seed=42)
special
Initialize self. See help(type(self)) for accurate signature.
estimate(self, x, y)
A point estimate of MI(X; Y) from an i.i.d. sample from the \(P(X, Y)\) distribution.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
x |
Union[numpy.__array_like._SupportsArray[numpy.dtype[Any]], numpy.__nested_sequence._NestedSequence[numpy.__array_like._SupportsArray[numpy.dtype[Any]]], bool, int, float, complex, str, bytes, numpy.__nested_sequence._NestedSequence[Union[bool, int, float, complex, str, bytes]]] |
shape |
required |
y |
Union[numpy.__array_like._SupportsArray[numpy.dtype[Any]], numpy.__nested_sequence._NestedSequence[numpy.__array_like._SupportsArray[numpy.dtype[Any]]], bool, int, float, complex, str, bytes, numpy.__nested_sequence._NestedSequence[Union[bool, int, float, complex, str, bytes]]] |
shape |
required |
Returns:
Type | Description |
---|---|
float |
mutual information estimate |
estimate_with_info(self, x, y)
Allows for reporting additional information about the run.
parameters(self)
Returns the parameters of the estimator.