IDEAS home Printed from https://ideas.repec.org/a/plo/pone00/0217316.html
   My bibliography  Save this article

S3CMTF: Fast, accurate, and scalable method for incomplete coupled matrix-tensor factorization

Author

Listed:
  • Dongjin Choi
  • Jun-Gi Jang
  • U Kang

Abstract

How can we extract hidden relations from a tensor and a matrix data simultaneously in a fast, accurate, and scalable way? Coupled matrix-tensor factorization (CMTF) is an important tool for this purpose. Designing an accurate and efficient CMTF method has become more crucial as the size and dimension of real-world data are growing explosively. However, existing methods for CMTF suffer from lack of accuracy, slow running time, and limited scalability. In this paper, we propose S3CMTF, a fast, accurate, and scalable CMTF method. In contrast to previous methods which do not handle large sparse tensors and are not parallelizable, S3CMTF provides parallel sparse CMTF by carefully deriving gradient update rules. S3CMTF asynchronously updates partial gradients without expensive locking. We show that our method is guaranteed to converge to a quality solution theoretically and empirically. S3CMTF further boosts the performance by carefully storing intermediate computation and reusing them. We theoretically and empirically show that S3CMTF is the fastest, outperforming existing methods. Experimental results show that S3CMTF is up to 930× faster than existing methods while providing the best accuracy. S3CMTF shows linear scalability on the number of data entries and the number of cores. In addition, we apply S3CMTF to Yelp rating tensor data coupled with 3 additional matrices to discover interesting patterns.

Suggested Citation

  • Dongjin Choi & Jun-Gi Jang & U Kang, 2019. "S3CMTF: Fast, accurate, and scalable method for incomplete coupled matrix-tensor factorization," PLOS ONE, Public Library of Science, vol. 14(6), pages 1-20, June.
  • Handle: RePEc:plo:pone00:0217316
    DOI: 10.1371/journal.pone.0217316
    as

    Download full text from publisher

    File URL: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0217316
    Download Restriction: no

    File URL: https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0217316&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pone.0217316?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Ding, Chris & Li, Tao & Peng, Wei, 2008. "On the equivalence between Non-negative Matrix Factorization and Probabilistic Latent Semantic Indexing," Computational Statistics & Data Analysis, Elsevier, vol. 52(8), pages 3913-3927, April.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Ma, Tinghuai & Suo, Xiafei & Zhou, Jinjuan & Tang, Meili & Guan, Donghai & Tian, Yuan & Al-Dhelaan, Abdullah & Al-Rodhaan, Mznah, 2016. "Augmenting matrix factorization technique with the combination of tags and genres," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 461(C), pages 101-116.
    2. Manini Madireddy & Ramasubramanian Sundararajan & Goda Doreswamy & Meisam Hejazi Nia & Amod Mital, 2017. "Constructing bundled offers for airline customers," Journal of Revenue and Pricing Management, Palgrave Macmillan, vol. 16(6), pages 532-552, December.
    3. Kyriaki Kalimeri & Matteo Delfino & Ciro Cattuto & Daniela Perrotta & Vittoria Colizza & Caroline Guerrisi & Clement Turbelin & Jim Duggan & John Edmunds & Chinelo Obi & Richard Pebody & Ana O Franco , 2019. "Unsupervised extraction of epidemic syndromes from participatory influenza surveillance self-reported symptoms," PLOS Computational Biology, Public Library of Science, vol. 15(4), pages 1-21, April.
    4. Shota Saito & Yoshito Hirata & Kazutoshi Sasahara & Hideyuki Suzuki, 2015. "Tracking Time Evolution of Collective Attention Clusters in Twitter: Time Evolving Nonnegative Matrix Factorisation," PLOS ONE, Public Library of Science, vol. 10(9), pages 1-17, September.
    5. Nicolas Jouvin & Pierre Latouche & Charles Bouveyron & Guillaume Bataillon & Alain Livartowski, 2021. "Greedy clustering of count data through a mixture of multinomial PCA," Computational Statistics, Springer, vol. 36(1), pages 1-33, March.
    6. Bastian Schaefermeier & Gerd Stumme & Tom Hanika, 2021. "Topic space trajectories," Scientometrics, Springer;Akadémiai Kiadó, vol. 126(7), pages 5759-5795, July.
    7. Travis R Meyer & Daniel Balagué & Miguel Camacho-Collados & Hao Li & Katie Khuu & P Jeffrey Brantingham & Andrea L Bertozzi, 2019. "A year in Madrid as described through the analysis of geotagged Twitter data," Environment and Planning B, , vol. 46(9), pages 1724-1740, November.
    8. Triss Ashton & Nicholas Evangelopoulos & Victor Prybutok, 2014. "Extending monitoring methods to textual data: a research agenda," Quality & Quantity: International Journal of Methodology, Springer, vol. 48(4), pages 2277-2294, July.
    9. Zhang, Zhong-Yuan & Gai, Yujie & Wang, Yu-Fei & Cheng, Hui-Min & Liu, Xin, 2018. "On equivalence of likelihood maximization of stochastic block model and constrained nonnegative matrix factorization," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 503(C), pages 687-697.
    10. Sun, Lijun & Axhausen, Kay W., 2016. "Understanding urban mobility patterns with a probabilistic tensor factorization framework," Transportation Research Part B: Methodological, Elsevier, vol. 91(C), pages 511-524.
    11. Danushka Bollegala & Georgios Kontonatsios & Sophia Ananiadou, 2015. "A Cross-Lingual Similarity Measure for Detecting Biomedical Term Translations," PLOS ONE, Public Library of Science, vol. 10(6), pages 1-28, June.
    12. Ma, Xiaoke & Wang, Bingbo & Yu, Liang, 2018. "Semi-supervised spectral algorithms for community detection in complex networks based on equivalence of clustering methods," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 490(C), pages 786-802.
    13. Alexandre L. M. Levada, 2021. "PCA-KL: a parametric dimensionality reduction approach for unsupervised metric learning," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 15(4), pages 829-868, December.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pone00:0217316. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: plosone (email available below). General contact details of provider: https://journals.plos.org/plosone/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.