IDEAS home Printed from https://ideas.repec.org/a/eee/csdana/v169y2022ics0167947322000032.html
   My bibliography  Save this article

Low-rank matrix denoising for count data using unbiased Kullback-Leibler risk estimation

Author

Listed:
  • Bigot, Jérémie
  • Deledalle, Charles

Abstract

Many statistical studies are concerned with the analysis of observations organized in a matrix form whose elements are count data. When these observations are assumed to follow a Poisson or a multinomial distribution, it is of interest to focus on the estimation of either the intensity matrix (Poisson case) or the compositional matrix (multinomial case) when it is assumed to have a low rank structure. In this setting, it is proposed to construct an estimator minimizing the regularized negative log-likelihood by a nuclear norm penalty. Such an approach easily yields a low-rank matrix-valued estimator with positive entries which belongs to the set of row-stochastic matrices in the multinomial case. Then, as a main contribution, a data-driven procedure is constructed to select the regularization parameter in the construction of such estimators by minimizing (approximately) unbiased estimates of the Kullback-Leibler (KL) risk in such models, which generalize Stein's unbiased risk estimation originally proposed for Gaussian data. The evaluation of these quantities is a delicate problem, and novel methods are introduced to obtain accurate numerical approximation of such unbiased estimates. Simulated data are used to validate this way of selecting regularizing parameters for low-rank matrix estimation from count data. For data following a multinomial distribution, the performances of this approach are also compared to K-fold cross-validation. Examples from a survey study and metagenomics also illustrate the benefits of this methodology for real data analysis.

Suggested Citation

  • Bigot, Jérémie & Deledalle, Charles, 2022. "Low-rank matrix denoising for count data using unbiased Kullback-Leibler risk estimation," Computational Statistics & Data Analysis, Elsevier, vol. 169(C).
  • Handle: RePEc:eee:csdana:v:169:y:2022:i:c:s0167947322000032
    DOI: 10.1016/j.csda.2022.107423
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S0167947322000032
    Download Restriction: Full text for ScienceDirect subscribers only.

    File URL: https://libkey.io/10.1016/j.csda.2022.107423?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to

    for a different version of it.

    References listed on IDEAS

    as
    1. Yuanpei Cao & Anru Zhang & Hongzhe Li, 2020. "Multisample estimation of bacterial composition matrices in metagenomics data," Biometrika, Biometrika Trust, vol. 107(1), pages 75-92.
    2. Robin, Geneviève & Josse, Julie & Moulines, Éric & Sardy, Sylvain, 2019. "Low-rank model with covariates for count data with missing values," Journal of Multivariate Analysis, Elsevier, vol. 173(C), pages 416-434.
    3. Shabalin, Andrey A. & Nobel, Andrew B., 2013. "Reconstruction of a low-rank matrix in the presence of Gaussian noise," Journal of Multivariate Analysis, Elsevier, vol. 118(C), pages 67-76.
    4. A. S. Lewis, 1996. "Derivatives of Spectral Functions," Mathematics of Operations Research, INFORMS, vol. 21(3), pages 576-588, August.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Kim, Kipoong & Park, Jaesung & Jung, Sungkyu, 2024. "Principal component analysis for zero-inflated compositional data," Computational Statistics & Data Analysis, Elsevier, vol. 198(C).
    2. Bongiorno, Christian & Lamrani, Lamia, 2025. "Quantifying the information lost in optimal covariance matrix cleaning," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 657(C).
    3. Li, Xiao & Matsuda, Takeru & Komaki, Fumiyasu, 2024. "Empirical Bayes Poisson matrix completion," Computational Statistics & Data Analysis, Elsevier, vol. 197(C).

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Defeng Sun & Jie Sun, 2008. "Löwner's Operator and Spectral Functions in Euclidean Jordan Algebras," Mathematics of Operations Research, INFORMS, vol. 33(2), pages 421-445, May.
    2. Li, Xiao & Matsuda, Takeru & Komaki, Fumiyasu, 2024. "Empirical Bayes Poisson matrix completion," Computational Statistics & Data Analysis, Elsevier, vol. 197(C).
    3. Leeb, William, 2022. "Optimal singular value shrinkage for operator norm loss: Extending to non-square matrices," Statistics & Probability Letters, Elsevier, vol. 186(C).
    4. Lewis, R.M. & Trosset, M.W., 2006. "Sensitivity analysis of the strain criterion for multidimensional scaling," Computational Statistics & Data Analysis, Elsevier, vol. 50(1), pages 135-153, January.
    5. Yong-Jin Liu & Jing Yu, 2023. "A semismooth Newton based dual proximal point algorithm for maximum eigenvalue problem," Computational Optimization and Applications, Springer, vol. 85(2), pages 547-582, June.
    6. Civitarese, Jamil, 2016. "Volatility and correlation-based systemic risk measures in the US market," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 459(C), pages 55-67.
    7. Kim, Kipoong & Park, Jaesung & Jung, Sungkyu, 2024. "Principal component analysis for zero-inflated compositional data," Computational Statistics & Data Analysis, Elsevier, vol. 198(C).
    8. Battey, H.S. & Cox, D.R., 2022. "Some aspects of non-standard multivariate analysis," Journal of Multivariate Analysis, Elsevier, vol. 188(C).
    9. Xin Chen & Houduo Qi & Liqun Qi & Kok-Lay Teo, 2004. "Smooth Convex Approximation to the Maximum Eigenvalue Function," Journal of Global Optimization, Springer, vol. 30(2), pages 253-270, November.
    10. P. Chatelain & X. Milhaud, 2024. "Estimation and prediction with data quality indexes in linear regressions," Computational Statistics, Springer, vol. 39(6), pages 3373-3404, September.
    11. Hyung-Il Kim & Seok Bong Yoo, 2020. "Trends in Super-High-Definition Imaging Techniques Based on Deep Neural Networks," Mathematics, MDPI, vol. 8(11), pages 1-19, October.
    12. J. Sun & L. W. Zhang & Y. Wu, 2006. "Properties of the Augmented Lagrangian in Nonlinear Semidefinite Optimization," Journal of Optimization Theory and Applications, Springer, vol. 129(3), pages 437-456, June.
    13. Gen Li & Sungkyu Jung, 2017. "Incorporating covariates into integrated factor analysis of multi‐view data," Biometrics, The International Biometric Society, vol. 73(4), pages 1433-1442, December.
    14. Junhui Cai & Dan Yang & Ran Chen & Wu Zhu & Haipeng Shen & Linda Zhao, 2021. "Network regression and supervised centrality estimation," Papers 2111.12921, arXiv.org, revised Feb 2025.
    15. Chen, Yunxiao & Li, Xiaoou, 2022. "Determining the number of factors in high-dimensional generalized latent factor models," LSE Research Online Documents on Economics 111574, London School of Economics and Political Science, LSE Library.
    16. Yong-Jin Liu & Jing Yu, 2022. "A Semismooth Newton-based Augmented Lagrangian Algorithm for Density Matrix Least Squares Problems," Journal of Optimization Theory and Applications, Springer, vol. 195(3), pages 749-779, December.
    17. Chao Kan & Wen Song, 2015. "Second-order conditions for existence of augmented Lagrange multipliers for eigenvalue composite optimization problems," Journal of Global Optimization, Springer, vol. 63(1), pages 77-97, September.
    18. Shulei Wang, 2023. "Robust differential abundance test in compositional data," Biometrika, Biometrika Trust, vol. 110(1), pages 169-185.
    19. Li, Gen & Yang, Dan & Nobel, Andrew B. & Shen, Haipeng, 2016. "Supervised singular value decomposition and its asymptotic properties," Journal of Multivariate Analysis, Elsevier, vol. 146(C), pages 7-17.
    20. Y Chen & X Li, 2022. "Determining the number of factors in high-dimensional generalized latent factor models [Eigenvalue ratio test for the number of factors]," Biometrika, Biometrika Trust, vol. 109(3), pages 769-782.

    More about this item

    Keywords

    ;
    ;
    ;
    ;
    ;
    ;
    ;
    ;
    ;
    ;

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:csdana:v:169:y:2022:i:c:s0167947322000032. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.elsevier.com/locate/csda .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.