IDEAS home Printed from https://ideas.repec.org/a/gam/jmathe/v9y2021i22p2840-d675494.html
   My bibliography  Save this article

Assessing Methods for Evaluating the Number of Components in Non-Negative Matrix Factorization

Author

Listed:
  • José M. Maisog

    (Blue Health Intelligence, Chicago, IL 60601, USA)

  • Andrew T. DeMarco

    (Department of Rehabilitation Medicine, Georgetown University Medical Center, Washington, DC 20057, USA)

  • Karthik Devarajan

    (Department of Biostatistics and Bioinformatics, Fox Chase Cancer Center, Temple University Health System, Philadelphia, PA 19111, USA)

  • Stanley Young

    (GCStat, 3401 Caldwell Drive, Raleigh, NC 27607, USA)

  • Paul Fogel

    (Advestis, 69 Boulevard Haussmann, 75008 Paris, France)

  • George Luta

    (Department of Biostatistics, Bioinformatics and Biomathematics, Georgetown University Medical Center, Washington, DC 20057, USA
    Department of Clinical Epidemiology, Aarhus University, 8000 Aarhus, Denmark
    The Parker Institute, Copenhagen University Hospital, 2000 Frederiksberg, Denmark)

Abstract

Non-negative matrix factorization is a relatively new method of matrix decomposition which factors an m × n data matrix X into an m × k matrix W and a k × n matrix H , so that X ≈ W × H . Importantly, all values in X , W , and H are constrained to be non-negative. NMF can be used for dimensionality reduction, since the k columns of W can be considered components into which X has been decomposed. The question arises: how does one choose k ? In this paper, we first assess methods for estimating k in the context of NMF in synthetic data. Second, we examine the effect of normalization on this estimate’s accuracy in empirical data. In synthetic data with orthogonal underlying components, methods based on PCA and Brunet’s Cophenetic Correlation Coefficient achieved the highest accuracy. When evaluated on a well-known real dataset, normalization had an unpredictable effect on the estimate. For any given normalization method, the methods for estimating k gave widely varying results. We conclude that when estimating k , it is best not to apply normalization. If the underlying components are known to be orthogonal, then Velicer’s MAP or Minka’s Laplace-PCA method might be best. However, when the orthogonality of the underlying components is unknown, none of the methods seemed preferable.

Suggested Citation

  • José M. Maisog & Andrew T. DeMarco & Karthik Devarajan & Stanley Young & Paul Fogel & George Luta, 2021. "Assessing Methods for Evaluating the Number of Components in Non-Negative Matrix Factorization," Mathematics, MDPI, vol. 9(22), pages 1-13, November.
  • Handle: RePEc:gam:jmathe:v:9:y:2021:i:22:p:2840-:d:675494
    as

    Download full text from publisher

    File URL: https://www.mdpi.com/2227-7390/9/22/2840/pdf
    Download Restriction: no

    File URL: https://www.mdpi.com/2227-7390/9/22/2840/
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Jushan Bai & Serena Ng, 2002. "Determining the Number of Factors in Approximate Factor Models," Econometrica, Econometric Society, vol. 70(1), pages 191-221, January.
    2. Daniel D. Lee & H. Sebastian Seung, 1999. "Learning the parts of objects by non-negative matrix factorization," Nature, Nature, vol. 401(6755), pages 788-791, October.
    3. ,, 2000. "Problems And Solutions," Econometric Theory, Cambridge University Press, vol. 16(2), pages 287-299, April.
    4. Karthik Devarajan, 2008. "Nonnegative Matrix Factorization: An Analytical and Interpretive Tool in Computational Biology," PLOS Computational Biology, Public Library of Science, vol. 4(7), pages 1-12, July.
    5. Zhu, Mu & Ghodsi, Ali, 2006. "Automatic dimensionality selection from the scree plot via the use of profile likelihood," Computational Statistics & Data Analysis, Elsevier, vol. 51(2), pages 918-930, November.
    6. Wayne Velicer, 1976. "Determining the number of components from the matrix of partial correlations," Psychometrika, Springer;The Psychometric Society, vol. 41(3), pages 321-327, September.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Xiaoming Li & Wei Yu & Guangquan Xu & Fangyuan Liu, 2022. "MSDA-NMF: A Multilayer Complex System Model Integrating Deep Autoencoder and NMF," Mathematics, MDPI, vol. 10(15), pages 1-18, August.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Stevanovic Dalibor, 2016. "Common time variation of parameters in reduced-form macroeconomic models," Studies in Nonlinear Dynamics & Econometrics, De Gruyter, vol. 20(2), pages 159-183, April.
    2. Paul Fogel & Yann Gaston-Mathé & Douglas Hawkins & Fajwel Fogel & George Luta & S. Stanley Young, 2016. "Applications of a Novel Clustering Approach Using Non-Negative Matrix Factorization to Environmental Research in Public Health," IJERPH, MDPI, vol. 13(5), pages 1-14, May.
    3. Raveenajit Kaur A. P. & Kalvant Singh & Alberto Luis August, 2021. "Exploring the Factor Structure of the Constructs of Technological, Pedagogical, and Content Knowledge (TPACK): An Exploratory Factor Analysis Based on the Perceptions of TESOL Pre-Service Teachers at ," Research Journal of Education, Academic Research Publishing Group, vol. 7(2), pages 103-115, 06-2021.
    4. Flavia Esposito, 2021. "A Review on Initialization Methods for Nonnegative Matrix Factorization: Towards Omics Data Experiments," Mathematics, MDPI, vol. 9(9), pages 1-17, April.
    5. repec:jss:jstsof:46:i04 is not listed on IDEAS
    6. Georgios Chalamandaris & Andrianos Tsekrekos, 2013. "Explanatory Factors and Causality in the Dynamics of Volatility Surfaces Implied from OTC Asian–Pacific Currency Options," Computational Economics, Springer;Society for Computational Economics, vol. 41(3), pages 327-358, March.
    7. Timothy P Moss & Victoria Lawson & Paul White & The Appearance Research Collaboration, 2014. "Salience and Valence of Appearance in a Population with a Visible Difference of Appearance: Direct and Moderated Relationships with Self-Consciousness, Anxiety and Depression," PLOS ONE, Public Library of Science, vol. 9(2), pages 1-8, February.
    8. Zura Kakushadze & Willie Yu, 2017. "*K-means and Cluster Models for Cancer Signatures," Papers 1703.00703, arXiv.org, revised Jul 2017.
    9. Hamed Taherdoost & Shamsul Sahibuddin & Neda Jalaliyoon, 2014. "Exploratory Factor Analysis; Concepts and Theory," Post-Print hal-02557344, HAL.
    10. Onintze Letona-Ibañez & Maria Carrasco & Silvia Martinez-Rodriguez & Alejandro Amillano & Nuria Ortiz-Marques, 2019. "Cognitive, relational and task crafting: Spanish adaptation and analysis of psychometric properties of the Job Crafting Questionnaire," PLOS ONE, Public Library of Science, vol. 14(10), pages 1-15, October.
    11. Jingu Kim & Yunlong He & Haesun Park, 2014. "Algorithms for nonnegative matrix and tensor factorizations: a unified view based on block coordinate descent framework," Journal of Global Optimization, Springer, vol. 58(2), pages 285-319, February.
    12. Agyeman, Stephen & Cheng, Lin, 2020. "Analysis of barriers to perceived service quality in Ghana: Students’ perspectives on bus mobility attributes," Transport Policy, Elsevier, vol. 99(C), pages 63-85.
    13. Hanley Chiang & Elias Walsh & Timothy Shanahan & Claudia Gentile & Alyssa Maccarone & Tiffany Waits & Barbara Carlson & Samuel Rikoon, "undated". "An Exploration of Instructional Practices that Foster Language Development and Comprehension: Evidence from Prekindergarten through Grade 3 in Title I Schools," Mathematica Policy Research Reports 4e488ad06b194729bcc88c8d2, Mathematica Policy Research.
    14. Lena Boneva (Körber) & Oliver Linton & Michael Vogt, 2013. "A semiparametric model for heterogeneous panel data with fixed effects," CeMMAP working papers 02/13, Institute for Fiscal Studies.
    15. Frestad, Dennis, 2008. "Common and unique factors influencing daily swap returns in the Nordic electricity market, 1997-2005," Energy Economics, Elsevier, vol. 30(3), pages 1081-1097, May.
    16. Hui-Min Wang & Ching-Lin Hsiao & Ai-Ru Hsieh & Ying-Chao Lin & Cathy S J Fann, 2012. "Constructing Endophenotypes of Complex Diseases Using Non-Negative Matrix Factorization and Adjusted Rand Index," PLOS ONE, Public Library of Science, vol. 7(7), pages 1-12, July.
    17. Elizabeth Bucacos, 2015. "Impact of international monetary policy in Uruguay: a FAVAR approach," Documentos de trabajo 2015003, Banco Central del Uruguay.
    18. Boneva, Lena & Linton, Oliver & Vogt, Michael, 2015. "A semiparametric model for heterogeneous panel data with fixed effects," Journal of Econometrics, Elsevier, vol. 188(2), pages 327-345.
    19. Patil, Vivek H. & Singh, Surendra N. & Mishra, Sanjay & Todd Donavan, D., 2008. "Efficient theory development and factor retention criteria: Abandon the `eigenvalue greater than one' criterion," Journal of Business Research, Elsevier, vol. 61(2), pages 162-170, February.
    20. Golino, Hudson F. & Demetriou, Andreas, 2017. "Estimating the dimensionality of intelligence like data using Exploratory Graph Analysis," Intelligence, Elsevier, vol. 62(C), pages 54-70.
    21. Juan José Echavarría & Andrés González, 2012. "Choques internacionales reales y financieros y su impacto sobre la economía colombiana," Revista ESPE - Ensayos sobre Política Económica, Banco de la Republica de Colombia, vol. 30(69), pages 14-66, December.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jmathe:v:9:y:2021:i:22:p:2840-:d:675494. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.