Estimators

The package supports a range of existing mutual information estimators. For the full list, see below.

Example

The design of the estimators was motivated by SciKit-Learn API¹. All estimators are classes. Once a class is initialized, one can use the estimate method, which maps arrays containing data points (of shape (n_points, n_dim)) to mutual information estimates:

import bmi

# Generate a sample with 1000 data points
task = bmi.benchmark.BENCHMARK_TASKS['1v1-normal-0.75']
X, Y = task.sample(1000, seed=42)
print(f"X shape: {X.shape}")  # Shape (1000, 1)
print(f"Y shape: {Y.shape}")  # Shape (1000, 1)

# Once an estimator is instantiated, it can be used to estimate mutual information
# by using the `estimate` method.
cca = bmi.estimators.CCAMutualInformationEstimator()
print(f"Estimate by CCA: {cca.estimate(X, Y):.2f}")

ksg = bmi.estimators.KSGEnsembleFirstEstimator(neighborhoods=(5,))
print(f"Estimate by KSG: {ksg.estimate(X, Y):.2f}")

Additionally, the estimators can be queried for their hyperparameters:

print(cca.parameters())  # CCA does not have tunable hyperparameters
# _EmptyParams()

print(ksg.parameters())  # KSG has tunable hyperparameters
# KSGEnsembleParameters(neighborhoods=[5], standardize=True, metric_x='euclidean', metric_y='euclidean')

The returned objects are structured using Pydantic.

List of estimators

Neural estimators

We support several standard neural estimators in JAX basing on the PyTorch implementations²:

Donsker-Varadhan estimator³ is implemented in DonskerVaradhanEstimator.
MINE³ estimator, which is a Donsker-Varadhan estimator with correction debiasing gradient during the fitting phase, is implemented in MINEEstimator.
InfoNCE⁴, also known as Contrastive Predictive Coding, is implemented in InfoNCEEstimator.
NWJ estimator⁵ is implemented as NWJEstimator.

Model-based estimators

Canonical correlation analysis⁶⁷ is suitable when \(P(X, Y)\) is multivariate normal and does not require hyperparameter tuning. It's implemented in CCAMutualInformationEstimator.

Histogram-based estimators

We implement a histogram-based estimator⁸ in HistogramEstimator. However, note that we do not support adaptive binning schemes.

Kernel density estimators

We implement a simple kernel density estimator in KDEMutualInformationEstimator.

Neighborhood-based estimators

An ensemble of Kraskov-Stögbauer-Grassberger estimators⁹ is implemented as KSGEnsembleFirstEstimator.

FAQ

Do these estimators work for discrete variables?

When both variables \(X\) and \(Y\) are discrete, we recommend the dit package. When one variable is discrete and the other is continuous, one can approximate mutual information by adding small noise to the discrete variable.

Todo

Add a Python example showing how to add the noise.

Where is the API showing how to use the estimators?

The API is here.

How can I add a new estimator?

Thank you for considering contributing to this project! Please, consult contributing guidelines and reach out to us on GitHub, so we can discuss the best way of adding the estimator to the package.

Generally, the following steps are required:

Implement the interface IMutualInformationPointEstimator in a new file inside src/bmi/estimators directory. The unit tests should be added in tests/estimators directory.
Export the new estimator to the public API by adding an entry in src/bmi/estimators/__init__.py.
Export the docstring of new estimator to docs/api/estimators.md.
Add the estimator to the list of estimators and ReadMe

Lars Buitinck and others. API design for machine learning software: experiences from the scikit-learn project. arXiv, 9 2013. arXiv:1309.0238. ↩
Jiaming Song and Stefano Ermon. Understanding the limitations of variational mutual information estimators. CoRR, 2019. URL: http://arxiv.org/abs/1910.06222, arXiv:1910.06222. ↩
Mohamed Ishmael Belghazi, Aristide Baratin, Sai Rajeshwar, Sherjil Ozair, Yoshua Bengio, Aaron Courville, and Devon Hjelm. Mutual information neural estimation. In International conference on machine learning, 531–540. PMLR, 2018. ↩↩
Aaron van den Oord, Yazhe Li, and Oriol Vinyals. Representation learning with contrastive predictive coding. arXiv preprint arXiv:1807.03748, 2018. ↩
XuanLong Nguyen, Martin J Wainwright, and Michael Jordan. Estimating divergence functionals and the likelihood ratio by penalized convex risk minimization. In J. Platt, D. Koller, Y. Singer, and S. Roweis, editors, Advances in Neural Information Processing Systems, volume 20. Curran Associates, Inc., 2007. URL: https://proceedings.neurips.cc/paper_files/paper/2007/file/72da7fd6d1302c0a159f6436d01e9eb0-Paper.pdf. ↩
David R. Brillinger. Some data analyses using mutual information. Brazilian Journal of Probability and Statistics, 18(2):163–182, 2004. URL: http://www.jstor.org/stable/43601047 (visited on 2023-09-24). ↩
J. Kay. Feature discovery under contextual supervision using mutual information. In IJCNN International Joint Conference on Neural Networks, volume 4, 79–84. 1992. doi:10.1109/IJCNN.1992.227286. ↩
Christopher J Cellucci, Alfonso M Albano, and Paul E Rapp. Statistical validation of mutual information calculations: comparison of alternative numerical algorithms. Physical review E, 71(6):066208, 2005. ↩
Alexander Kraskov, Harald Stögbauer, and Peter Grassberger. Estimating mutual information. Physical Review E, 69(6):066138, 2004. ↩