IDEAS home Printed from https://ideas.repec.org/p/hum/wpaper/sfb649dp2016-049.html
   My bibliography  Save this paper

Q3-D3-Lsa

Author

Listed:
  • Lukas Borke
  • Wolfgang K. Härdle

Abstract

QuantNet 1 is an integrated web-based environment consisting of different types of statistics-related documents and program codes. Its goal is creating reproducibility and offering a platform for sharing validated knowledge native to the social web. To increase the information retrieval (IR) efficiency there is a need for incorporating semantic information. Three text mining models will be examined: vector space model (VSM), generalized VSM (GVSM) and latent semantic analysis (LSA). The LSA has been successfully used for IR purposes as a technique for capturing semantic relations between terms and inserting them into the similarity measure between documents. Our results show that different model configurations allow adapted similarity-based document clustering and knowledge discovery. In particular, different LSA configurations together with hierarchical clustering reveal good results under M3 evaluation. QuantNet and the corresponding Data-Driven Documents (D3) based visualization can be found and applied under http://quantlet.de. The driving technology behind it is Q3-D3-LSA, which is the combination of “GitHub API based QuantNet Mining infrastructure in R”, LSA and D3 implementation.

Suggested Citation

  • Lukas Borke & Wolfgang K. Härdle, 2016. "Q3-D3-Lsa," SFB 649 Discussion Papers SFB649DP2016-049, Sonderforschungsbereich 649, Humboldt University, Berlin, Germany.
  • Handle: RePEc:hum:wpaper:sfb649dp2016-049
    as

    Download full text from publisher

    File URL: http://sfb649.wiwi.hu-berlin.de/papers/pdf/SFB649DP2016-049.pdf
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Theußl, Stefan & Feinerer, Ingo & Hornik, Kurt, 2012. "A tm Plug-In for Distributed Text Mining in R," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 51(i05).
    2. Brock, Guy & Pihur, Vasyl & Datta, Susmita & Datta, Somnath, 2008. "clValid: An R Package for Cluster Validation," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 25(i04).
    3. Golyandina, Nina & Korobeynikov, Anton, 2014. "Basic Singular Spectrum Analysis and forecasting with R," Computational Statistics & Data Analysis, Elsevier, vol. 71(C), pages 934-954.
    4. Feinerer, Ingo & Hornik, Kurt & Meyer, David, 2008. "Text Mining Infrastructure in R," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 25(i05).
    5. Scott Deerwester & Susan T. Dumais & George W. Furnas & Thomas K. Landauer & Richard Harshman, 1990. "Indexing by latent semantic analysis," Journal of the American Society for Information Science, Association for Information Science & Technology, vol. 41(6), pages 391-407, September.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Cho, Yung-Jan & Fu, Pei-Wen & Wu, Chi-Cheng, 2017. "Popular Research Topics in Marketing Journals, 1995–2014," Journal of Interactive Marketing, Elsevier, vol. 40(C), pages 52-72.
    2. Romero-Silva, Rodrigo & de Leeuw, Sander, 2021. "Learning from the past to shape the future: A comprehensive text mining analysis of OR/MS reviews," Omega, Elsevier, vol. 100(C).
    3. Maksym Polyakov & Serhiy Polyakov & Md Sayed Iftekhar, 2017. "Does academic collaboration equally benefit impact of research across topics? The case of agricultural, resource, environmental and ecological economics," Scientometrics, Springer;Akadémiai Kiadó, vol. 113(3), pages 1385-1405, December.
    4. Valter Martins Vairinhos & Luís Agonia Pereira & Florinda Matos & Helena Nunes & Carmen Patino & Purificación Galindo-Villardón, 2022. "Framework for Classroom Student Grading with Open-Ended Questions: A Text-Mining Approach," Mathematics, MDPI, vol. 10(21), pages 1-20, November.
    5. Irina Wedel & Michael Palk & Stefan Voß, 2022. "A Bilingual Comparison of Sentiment and Topics for a Product Event on Twitter," Information Systems Frontiers, Springer, vol. 24(5), pages 1635-1646, October.
    6. Patrick Zschech & Kai Heinrich & Raphael Bink & Janis S. Neufeld, 2019. "Prognostic Model Development with Missing Labels," Business & Information Systems Engineering: The International Journal of WIRTSCHAFTSINFORMATIK, Springer;Gesellschaft für Informatik e.V. (GI), vol. 61(3), pages 327-343, June.
    7. Yuyang Gao & Chao Qu & Kequan Zhang, 2016. "A Hybrid Method Based on Singular Spectrum Analysis, Firefly Algorithm, and BP Neural Network for Short-Term Wind Speed Forecasting," Energies, MDPI, vol. 9(10), pages 1-28, September.
    8. Mohammed Salem Binwahlan, 2023. "Polynomial Networks Model for Arabic Text Summarization," International Journal of Research and Scientific Innovation, International Journal of Research and Scientific Innovation (IJRSI), vol. 10(2), pages 74-84, February.
    9. Grinis, Inna, 2017. "The STEM requirements of "non-STEM" jobs: evidence from UK online vacancy postings and implications for skills & knowledge shortages," LSE Research Online Documents on Economics 85123, London School of Economics and Political Science, LSE Library.
    10. Curci, Ylenia & Mongeau Ospina, Christian A., 2016. "Investigating biofuels through network analysis," Energy Policy, Elsevier, vol. 97(C), pages 60-72.
    11. Chao Wei & Senlin Luo & Xincheng Ma & Hao Ren & Ji Zhang & Limin Pan, 2016. "Locally Embedding Autoencoders: A Semi-Supervised Manifold Learning Approach of Document Representation," PLOS ONE, Public Library of Science, vol. 11(1), pages 1-20, January.
    12. Julia Bachtrögler & Christoph Hammer & Wolf Heinrich Reuter & Florian Schwendinger, 2019. "Guide to the galaxy of EU regional funds recipients: evidence from new data," Empirica, Springer;Austrian Institute for Economic Research;Austrian Economic Association, vol. 46(1), pages 103-150, February.
    13. Shuyue Huang & Lena Jingen Liang & Hwansuk Chris Choi, 2022. "How We Failed in Context: A Text-Mining Approach to Understanding Hotel Service Failures," Sustainability, MDPI, vol. 14(5), pages 1-18, February.
    14. Laura Anderlucci & Cinzia Viroli, 2020. "Mixtures of Dirichlet-Multinomial distributions for supervised and unsupervised classification of short text data," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 14(4), pages 759-770, December.
    15. Gainbi Park & Zengwang Xu, 2022. "The constituent components and local indicator variables of social vulnerability index," Natural Hazards: Journal of the International Society for the Prevention and Mitigation of Natural Hazards, Springer;International Society for the Prevention and Mitigation of Natural Hazards, vol. 110(1), pages 95-120, January.
    16. Maksym Polyakov & Morteza Chalak & Md. Sayed Iftekhar & Ram Pandit & Sorada Tapsuwan & Fan Zhang & Chunbo Ma, 2018. "Authorship, Collaboration, Topics, and Research Gaps in Environmental and Resource Economics 1991–2015," Environmental & Resource Economics, Springer;European Association of Environmental and Resource Economists, vol. 71(1), pages 217-239, September.
    17. Stefano Sbalchiero & Maciej Eder, 2020. "Topic modeling, long texts and the best number of topics. Some Problems and solutions," Quality & Quantity: International Journal of Methodology, Springer, vol. 54(4), pages 1095-1108, August.
    18. Ding, Ying, 2011. "Community detection: Topological vs. topical," Journal of Informetrics, Elsevier, vol. 5(4), pages 498-514.
    19. Klaus Gugler & Florian Szücs & Ulrich Wohak, 2023. "Start-up Acquisitions, Venture Capital and Innovation: A Comparative Study of Google, Apple, Facebook, Amazon and Microsoft," Department of Economics Working Papers wuwp340, Vienna University of Economics and Business, Department of Economics.
    20. Aman Kalteh, 2016. "Improving Forecasting Accuracy of Streamflow Time Series Using Least Squares Support Vector Machine Coupled with Data-Preprocessing Techniques," Water Resources Management: An International Journal, Published for the European Water Resources Association (EWRA), Springer;European Water Resources Association (EWRA), vol. 30(2), pages 747-766, January.

    More about this item

    JEL classification:

    • C87 - Mathematical and Quantitative Methods - - Data Collection and Data Estimation Methodology; Computer Programs - - - Econometric Software
    • C88 - Mathematical and Quantitative Methods - - Data Collection and Data Estimation Methodology; Computer Programs - - - Other Computer Software
    • G17 - Financial Economics - - General Financial Markets - - - Financial Forecasting and Simulation
    • D3 - Microeconomics - - Distribution

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:hum:wpaper:sfb649dp2016-049. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: RDC-Team (email available below). General contact details of provider: https://edirc.repec.org/data/sohubde.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.