Estimators
The package supports a range of existing mutual information estimators. For the full list, see below.
Example
The design of the estimators was motivated by SciKit-Learn API1.
All estimators are classes. Once a class is initialized, one can use the estimate
method, which maps arrays containing data points (of shape (n_points, n_dim)
)
to mutual information estimates:
import bmi
# Generate a sample with 1000 data points
task = bmi.benchmark.BENCHMARK_TASKS['1v1-normal-0.75']
X, Y = task.sample(1000, seed=42)
print(f"X shape: {X.shape}") # Shape (1000, 1)
print(f"Y shape: {Y.shape}") # Shape (1000, 1)
# Once an estimator is instantiated, it can be used to estimate mutual information
# by using the `estimate` method.
cca = bmi.estimators.CCAMutualInformationEstimator()
print(f"Estimate by CCA: {cca.estimate(X, Y):.2f}")
ksg = bmi.estimators.KSGEnsembleFirstEstimator(neighborhoods=(5,))
print(f"Estimate by KSG: {ksg.estimate(X, Y):.2f}")
Additionally, the estimators can be queried for their hyperparameters:
print(cca.parameters()) # CCA does not have tunable hyperparameters
# _EmptyParams()
print(ksg.parameters()) # KSG has tunable hyperparameters
# KSGEnsembleParameters(neighborhoods=[5], standardize=True, metric_x='euclidean', metric_y='euclidean')
The returned objects are structured using Pydantic.
List of estimators
Neural estimators
We support several standard neural estimators in JAX basing on the PyTorch implementations2:
- Donsker-Varadhan estimator3 is implemented in
DonskerVaradhanEstimator
. - MINE3 estimator, which is a Donsker-Varadhan estimator with correction debiasing gradient during the fitting phase, is implemented in
MINEEstimator
. - InfoNCE4, also known as Contrastive Predictive Coding, is implemented in
InfoNCEEstimator
. - NWJ estimator5 is implemented as
NWJEstimator
.
Model-based estimators
- Canonical correlation analysis67 is suitable when \(P(X, Y)\) is multivariate normal and does not require hyperparameter tuning. It's implemented in
CCAMutualInformationEstimator
.
Histogram-based estimators
- We implement a histogram-based estimator8 in
HistogramEstimator
. However, note that we do not support adaptive binning schemes.
Kernel density estimators
- We implement a simple kernel density estimator in
KDEMutualInformationEstimator
.
Neighborhood-based estimators
- An ensemble of Kraskov-Stögbauer-Grassberger estimators9 is implemented as
KSGEnsembleFirstEstimator
.
FAQ
Do these estimators work for discrete variables?
When both variables \(X\) and \(Y\) are discrete, we recommend the dit
package. When one variable is discrete and the other is continuous, one can approximate mutual information by adding small noise to the discrete variable.
Todo
Add a Python example showing how to add the noise.
Where is the API showing how to use the estimators?
The API is here.
How can I add a new estimator?
Thank you for considering contributing to this project! Please, consult contributing guidelines and reach out to us on GitHub, so we can discuss the best way of adding the estimator to the package.
Generally, the following steps are required:
- Implement the interface
IMutualInformationPointEstimator
in a new file insidesrc/bmi/estimators
directory. The unit tests should be added intests/estimators
directory. - Export the new estimator to the public API by adding an entry in
src/bmi/estimators/__init__.py
. - Export the docstring of new estimator to
docs/api/estimators.md
. - Add the estimator to the list of estimators and ReadMe
-
Lars Buitinck and others. API design for machine learning software: experiences from the scikit-learn project. arXiv, 9 2013. arXiv:1309.0238. ↩
-
Jiaming Song and Stefano Ermon. Understanding the limitations of variational mutual information estimators. CoRR, 2019. URL: http://arxiv.org/abs/1910.06222, arXiv:1910.06222. ↩
-
Mohamed Ishmael Belghazi, Aristide Baratin, Sai Rajeshwar, Sherjil Ozair, Yoshua Bengio, Aaron Courville, and Devon Hjelm. Mutual information neural estimation. In International conference on machine learning, 531–540. PMLR, 2018. ↩↩
-
Aaron van den Oord, Yazhe Li, and Oriol Vinyals. Representation learning with contrastive predictive coding. arXiv preprint arXiv:1807.03748, 2018. ↩
-
XuanLong Nguyen, Martin J Wainwright, and Michael Jordan. Estimating divergence functionals and the likelihood ratio by penalized convex risk minimization. In J. Platt, D. Koller, Y. Singer, and S. Roweis, editors, Advances in Neural Information Processing Systems, volume 20. Curran Associates, Inc., 2007. URL: https://proceedings.neurips.cc/paper_files/paper/2007/file/72da7fd6d1302c0a159f6436d01e9eb0-Paper.pdf. ↩
-
David R. Brillinger. Some data analyses using mutual information. Brazilian Journal of Probability and Statistics, 18(2):163–182, 2004. URL: http://www.jstor.org/stable/43601047 (visited on 2023-09-24). ↩
-
J. Kay. Feature discovery under contextual supervision using mutual information. In IJCNN International Joint Conference on Neural Networks, volume 4, 79–84. 1992. doi:10.1109/IJCNN.1992.227286. ↩
-
Christopher J Cellucci, Alfonso M Albano, and Paul E Rapp. Statistical validation of mutual information calculations: comparison of alternative numerical algorithms. Physical review E, 71(6):066208, 2005. ↩
-
Alexander Kraskov, Harald Stögbauer, and Peter Grassberger. Estimating mutual information. Physical Review E, 69(6):066138, 2004. ↩