Low-rank matrix denoising for count data using unbiased Kullback-Leibler risk estimation

My bibliography Save this article

Low-rank matrix denoising for count data using unbiased Kullback-Leibler risk estimation

Author

Listed:

Bigot, Jérémie
Deledalle, Charles

Registered:

Abstract

Many statistical studies are concerned with the analysis of observations organized in a matrix form whose elements are count data. When these observations are assumed to follow a Poisson or a multinomial distribution, it is of interest to focus on the estimation of either the intensity matrix (Poisson case) or the compositional matrix (multinomial case) when it is assumed to have a low rank structure. In this setting, it is proposed to construct an estimator minimizing the regularized negative log-likelihood by a nuclear norm penalty. Such an approach easily yields a low-rank matrix-valued estimator with positive entries which belongs to the set of row-stochastic matrices in the multinomial case. Then, as a main contribution, a data-driven procedure is constructed to select the regularization parameter in the construction of such estimators by minimizing (approximately) unbiased estimates of the Kullback-Leibler (KL) risk in such models, which generalize Stein's unbiased risk estimation originally proposed for Gaussian data. The evaluation of these quantities is a delicate problem, and novel methods are introduced to obtain accurate numerical approximation of such unbiased estimates. Simulated data are used to validate this way of selecting regularizing parameters for low-rank matrix estimation from count data. For data following a multinomial distribution, the performances of this approach are also compared to K-fold cross-validation. Examples from a survey study and metagenomics also illustrate the benefits of this methodology for real data analysis.

Suggested Citation

Bigot, Jérémie & Deledalle, Charles, 2022. "Low-rank matrix denoising for count data using unbiased Kullback-Leibler risk estimation," Computational Statistics & Data Analysis, Elsevier, vol. 169(C).

Handle: RePEc:eee:csdana:v:169:y:2022:i:c:s0167947322000032
DOI: 10.1016/j.csda.2022.107423

Download full text from publisher

As the access to this document is restricted, you may want to

for a different version of it.

References listed on IDEAS

A. S. Lewis, 1996. "Derivatives of Spectral Functions," Mathematics of Operations Research, INFORMS, vol. 21(3), pages 576-588, August.
Yuanpei Cao & Anru Zhang & Hongzhe Li, 2020. "Multisample estimation of bacterial composition matrices in metagenomics data," Biometrika, Biometrika Trust, vol. 107(1), pages 75-92.
Robin, Geneviève & Josse, Julie & Moulines, Éric & Sardy, Sylvain, 2019. "Low-rank model with covariates for count data with missing values," Journal of Multivariate Analysis, Elsevier, vol. 173(C), pages 416-434.
Shabalin, Andrey A. & Nobel, Andrew B., 2013. "Reconstruction of a low-rank matrix in the presence of Gaussian noise," Journal of Multivariate Analysis, Elsevier, vol. 118(C), pages 67-76.

Full references (including those not matched with items on IDEAS)

Citations

Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.

Cited by:

Li, Xiao & Matsuda, Takeru & Komaki, Fumiyasu, 2024. "Empirical Bayes Poisson matrix completion," Computational Statistics & Data Analysis, Elsevier, vol. 197(C).
Kim, Kipoong & Park, Jaesung & Jung, Sungkyu, 2024. "Principal component analysis for zero-inflated compositional data," Computational Statistics & Data Analysis, Elsevier, vol. 198(C).
Bongiorno, Christian & Lamrani, Lamia, 2025. "Quantifying the information lost in optimal covariance matrix cleaning," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 657(C).

Most related items

These are the items that most often cite the same works as this one and are cited by the same works as this one.

Defeng Sun & Jie Sun, 2008. "Löwner's Operator and Spectral Functions in Euclidean Jordan Algebras," Mathematics of Operations Research, INFORMS, vol. 33(2), pages 421-445, May.
Li, Xiao & Matsuda, Takeru & Komaki, Fumiyasu, 2024. "Empirical Bayes Poisson matrix completion," Computational Statistics & Data Analysis, Elsevier, vol. 197(C).
Leeb, William, 2022. "Optimal singular value shrinkage for operator norm loss: Extending to non-square matrices," Statistics & Probability Letters, Elsevier, vol. 186(C).
Bo Yuan & Shulei Wang, 2025. "Microbiome data integration via shared dictionary learning," Nature Communications, Nature, vol. 16(1), pages 1-20, December.
Lewis, R.M. & Trosset, M.W., 2006. "Sensitivity analysis of the strain criterion for multidimensional scaling," Computational Statistics & Data Analysis, Elsevier, vol. 50(1), pages 135-153, January.
Yong-Jin Liu & Jing Yu, 2023. "A semismooth Newton based dual proximal point algorithm for maximum eigenvalue problem," Computational Optimization and Applications, Springer, vol. 85(2), pages 547-582, June.
Civitarese, Jamil, 2016. "Volatility and correlation-based systemic risk measures in the US market," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 459(C), pages 55-67.
Kim, Kipoong & Park, Jaesung & Jung, Sungkyu, 2024. "Principal component analysis for zero-inflated compositional data," Computational Statistics & Data Analysis, Elsevier, vol. 198(C).
Battey, H.S. & Cox, D.R., 2022. "Some aspects of non-standard multivariate analysis," Journal of Multivariate Analysis, Elsevier, vol. 188(C).
Xin Chen & Houduo Qi & Liqun Qi & Kok-Lay Teo, 2004. "Smooth Convex Approximation to the Maximum Eigenvalue Function," Journal of Global Optimization, Springer, vol. 30(2), pages 253-270, November.
P. Chatelain & X. Milhaud, 2024. "Estimation and prediction with data quality indexes in linear regressions," Computational Statistics, Springer, vol. 39(6), pages 3373-3404, September.
Hyung-Il Kim & Seok Bong Yoo, 2020. "Trends in Super-High-Definition Imaging Techniques Based on Deep Neural Networks," Mathematics, MDPI, vol. 8(11), pages 1-19, October.
J. Sun & L. W. Zhang & Y. Wu, 2006. "Properties of the Augmented Lagrangian in Nonlinear Semidefinite Optimization," Journal of Optimization Theory and Applications, Springer, vol. 129(3), pages 437-456, June.
Gen Li & Sungkyu Jung, 2017. "Incorporating covariates into integrated factor analysis of multi‐view data," Biometrics, The International Biometric Society, vol. 73(4), pages 1433-1442, December.
Junhui Cai & Dan Yang & Ran Chen & Wu Zhu & Haipeng Shen & Linda Zhao, 2021. "Network regression and supervised centrality estimation," Papers 2111.12921, arXiv.org, revised Feb 2025.
Chen, Yunxiao & Li, Xiaoou, 2022. "Determining the number of factors in high-dimensional generalized latent factor models," LSE Research Online Documents on Economics 111574, London School of Economics and Political Science, LSE Library.
Yong-Jin Liu & Jing Yu, 2022. "A Semismooth Newton-based Augmented Lagrangian Algorithm for Density Matrix Least Squares Problems," Journal of Optimization Theory and Applications, Springer, vol. 195(3), pages 749-779, December.
Anna Bykhovskaya & Vadim Gorin & Sasha Sodin, 2025. "How weak are weak factors? Uniform inference for signal strength in signal plus noise models," Papers 2507.18554, arXiv.org.
Chao Kan & Wen Song, 2015. "Second-order conditions for existence of augmented Lagrange multipliers for eigenvalue composite optimization problems," Journal of Global Optimization, Springer, vol. 63(1), pages 77-97, September.
Shulei Wang, 2023. "Robust differential abundance test in compositional data," Biometrika, Biometrika Trust, vol. 110(1), pages 169-185.

More about this item

Keywords

; ; ; ; ; ; ; ; ; ;

Statistics

Access and download statistics

Corrections

All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:csdana:v:169:y:2022:i:c:s0167947322000032. See general information about how to correct material in RePEc.

If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.elsevier.com/locate/csda .

Please note that corrections may take a couple of weeks to filter through the various RePEc services.

IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.

Browse Econ Literature

More features

Low-rank matrix denoising for count data using unbiased Kullback-Leibler risk estimation

Author

Abstract

Suggested Citation

Download full text from publisher

References listed on IDEAS

Citations

Most related items

More about this item

Keywords

Statistics

Corrections

More services and features

MyIDEAS

Author registration

Rankings

RePEc Genealogy

RePEc Biblio

MPRA

New papers by email

EconAcademics

Plagiarism

About RePEc

RePEc home

Blog

Help/FAQ

RePEc team

Participating archives

Privacy statement

Help us

Corrections

Volunteers

Get papers listed

Open a RePEc archive

Get RePEc data