IDEAS home Printed from https://ideas.repec.org/p/ehl/lserod/102110.html
   My bibliography  Save this paper

Determining the number of latent factors in statistical multi-relational learning

Author

Listed:
  • Shi, Chengchun
  • Lu, Wenbin
  • Song, Rui

Abstract

Statistical relational learning is primarily concerned with learning and inferring relationships between entities in large-scale knowledge graphs. Nickel et al. (2011) proposed a RESCAL tensor factorization model for statistical relational learning, which achieves better or at least comparable results on common benchmark data sets when compared to other state-of-the-art methods. Given a positive integer s, RESCAL computes an s-dimensional latent vector for each entity. The latent factors can be further used for solving relational learning tasks, such as collective classification, collective entity resolution and link-based clustering. The focus of this paper is to determine the number of latent factors in the RESCAL model. Due to the structure of the RESCAL model, its log-likelihood function is not concave. As a result, the corresponding maximum likelihood estimators (MLEs) may not be consistent. Nonetheless, we design a specific pseudometric, prove the consistency of the MLEs under this pseudometric and establish its rate of convergence. Based on these results, we propose a general class of information criteria and prove their model selection consistencies when the number of relations is either bounded or diverges at a proper rate of the number of entities. Simulations and real data examples show that our proposed information criteria have good finite sample properties.

Suggested Citation

  • Shi, Chengchun & Lu, Wenbin & Song, Rui, 2019. "Determining the number of latent factors in statistical multi-relational learning," LSE Research Online Documents on Economics 102110, London School of Economics and Political Science, LSE Library.
  • Handle: RePEc:ehl:lserod:102110
    as

    Download full text from publisher

    File URL: http://eprints.lse.ac.uk/102110/
    File Function: Open access version.
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Yingying Fan & Cheng Yong Tang, 2013. "Tuning parameter selection in high dimensional penalized likelihood," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 75(3), pages 531-552, June.
    2. Will Wei Sun & Junwei Lu & Han Liu & Guang Cheng, 2017. "Provable sparse tensor decomposition," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 79(3), pages 899-916, June.
    3. Ledyard Tucker, 1966. "Some mathematical notes on three-mode factor analysis," Psychometrika, Springer;The Psychometric Society, vol. 31(3), pages 279-311, September.
    4. Harshman, Richard A. & Lundy, Margaret E., 1994. "PARAFAC: Parallel factor analysis," Computational Statistics & Data Analysis, Elsevier, vol. 18(1), pages 39-72, August.
    5. Yun Yang & David B. Dunson, 2016. "Bayesian Conditional Tensor Factorizations for High-Dimensional Classification," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 111(514), pages 656-669, April.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Jean-Pierre Rossi & Maxime Nardin & Martin Godefroid & Manuela Ruiz-Diaz & Anne-Sophie Sergent & Alejandro Martinez-Meier & Luc Pâques & Philippe Rozenberg, 2014. "Dissecting the Space-Time Structure of Tree-Ring Datasets Using the Partial Triadic Analysis," PLOS ONE, Public Library of Science, vol. 9(9), pages 1-13, September.
    2. Richard Harshman & Margaret Lundy, 1996. "Uniqueness proof for a family of models sharing features of Tucker's three-mode factor analysis and PARAFAC/candecomp," Psychometrika, Springer;The Psychometric Society, vol. 61(1), pages 133-154, March.
    3. Veldscholte, Carla M. & Kroonenberg, Pieter M. & Antonides, Gerrit, 1998. "Three-mode analysis of perceptions of economic activities in Eastern and Western Europe1," Journal of Economic Psychology, Elsevier, vol. 19(3), pages 321-351, June.
    4. Carlos Martin-Barreiro & John A. Ramirez-Figueroa & Ana B. Nieto-Librero & Víctor Leiva & Ana Martin-Casado & M. Purificación Galindo-Villardón, 2021. "A New Algorithm for Computing Disjoint Orthogonal Components in the Three-Way Tucker Model," Mathematics, MDPI, vol. 9(3), pages 1-22, January.
    5. Stegeman, Alwin, 2014. "Finding the limit of diverging components in three-way Candecomp/Parafac—A demonstration of its practical merits," Computational Statistics & Data Analysis, Elsevier, vol. 75(C), pages 203-216.
    6. Zhenghao Zeng & Yuqi Gu & Gongjun Xu, 2023. "A Tensor-EM Method for Large-Scale Latent Class Analysis with Binary Responses," Psychometrika, Springer;The Psychometric Society, vol. 88(2), pages 580-612, June.
    7. Dawn Iacobucci & Doug Grisaffe & Wayne DeSarbo, 2017. "Statistical perceptual maps: using confidence region ellipses to enhance the interpretations of brand positions in multidimensional scaling," Journal of Marketing Analytics, Palgrave Macmillan, vol. 5(3), pages 81-98, December.
    8. Susana Mendes & M. José Fernández-Gómez & Sónia Cotrim Marques & Miguel Ângelo Pardal & Ulisses Miranda Azeiteiro & M. Purificación Galindo-Villardón, 2017. "CO-tucker: a new method for the simultaneous analysis of a sequence of paired tables," Journal of Applied Statistics, Taylor & Francis Journals, vol. 44(15), pages 2729-2755, November.
    9. Mariela González-Narváez & María José Fernández-Gómez & Susana Mendes & José-Luis Molina & Omar Ruiz-Barzola & Purificación Galindo-Villardón, 2021. "Study of Temporal Variations in Species–Environment Association through an Innovative Multivariate Method: MixSTATICO," Sustainability, MDPI, vol. 13(11), pages 1-25, May.
    10. Meyners, Michael & Qannari, El Mostafa, 2001. "Relating principal component analysis on merged data sets to a regression approach," Technical Reports 2001,47, Technische Universität Dortmund, Sonderforschungsbereich 475: Komplexitätsreduktion in multivariaten Datenstrukturen.
    11. Yuefeng Han & Rong Chen & Dan Yang & Cun-Hui Zhang, 2020. "Tensor Factor Model Estimation by Iterative Projection," Papers 2006.02611, arXiv.org, revised May 2022.
    12. DELL'ANNO, Roberto & VILLA, Stefania, 2012. "Growth in Transition Countries: Big Bang versus Gradualism," CELPE Discussion Papers 122, CELPE - CEnter for Labor and Political Economics, University of Salerno, Italy.
    13. Henk Kiers, 1991. "Hierarchical relations among three-way methods," Psychometrika, Springer;The Psychometric Society, vol. 56(3), pages 449-470, September.
    14. Willem Kloot & Pieter Kroonenberg, 1985. "External analysis with three-mode principal component models," Psychometrika, Springer;The Psychometric Society, vol. 50(4), pages 479-494, December.
    15. Zemin Zheng & Jie Zhang & Yang Li, 2022. "L 0 -Regularized Learning for High-Dimensional Additive Hazards Regression," INFORMS Journal on Computing, INFORMS, vol. 34(5), pages 2762-2775, September.
    16. Alexander Chudik & George Kapetanios & M. Hashem Pesaran, 2016. "Big Data Analytics: A New Perspective," CESifo Working Paper Series 5824, CESifo.
    17. Emre Demirkaya & Yang Feng & Pallavi Basu & Jinchi Lv, 2022. "Large-scale model selection in misspecified generalized linear models [Information theory and an extension of the maximum likelihood principle]," Biometrika, Biometrika Trust, vol. 109(1), pages 123-136.
    18. Pieter M. Kroonenberg & Cornelis J. Lammers & Ineke Stoop, 1985. "Three-Mode Principal Component Analysis of Multivariate Longitudinal Organizational Data," Sociological Methods & Research, , vol. 14(2), pages 99-136, November.
    19. Elisa Frutos-Bernal & Ángel Martín del Rey & Irene Mariñas-Collado & María Teresa Santos-Martín, 2022. "An Analysis of Travel Patterns in Barcelona Metro Using Tucker3 Decomposition," Mathematics, MDPI, vol. 10(7), pages 1-17, March.
    20. Xinhai Liu & Wolfgang Glänzel & Bart De Moor, 2011. "Hybrid clustering of multi-view data via Tucker-2 model and its application," Scientometrics, Springer;Akadémiai Kiadó, vol. 88(3), pages 819-839, September.

    More about this item

    Keywords

    information criteria; knowledge graph; model selection consistency; RESCAL model; statistical relational learning; tensor factorization;
    All these keywords.

    JEL classification:

    • C1 - Mathematical and Quantitative Methods - - Econometric and Statistical Methods and Methodology: General

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:ehl:lserod:102110. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: LSERO Manager (email available below). General contact details of provider: https://edirc.repec.org/data/lsepsuk.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.