IDEAS home Printed from https://ideas.repec.org/a/gam/jmathe/v9y2021i14p1671-d595258.html
   My bibliography  Save this article

LDAShiny: An R Package for Exploratory Review of Scientific Literature Based on a Bayesian Probabilistic Model and Machine Learning Tools

Author

Listed:
  • Javier De la Hoz-M

    (Facultad de Ingeniería, Universidad del Magdalena, Santa Marta 470004, Colombia
    Department of Statistics, University of Salamanca, 37008 Salamanca, Spain)

  • Mª José Fernández-Gómez

    (Department of Statistics, University of Salamanca, 37008 Salamanca, Spain
    Institute of Biomedical Research of Salamanca, 37008 Salamanca, Spain)

  • Susana Mendes

    (MARE, School of Tourism and Maritime Technology, Polytechnic of Leiria, 2520-614 Peniche, Portugal)

Abstract

In this paper we propose an open source application called LDAShiny, which provides a graphical user interface to perform a review of scientific literature using the latent Dirichlet allocation algorithm and machine learning tools in an interactive and easy-to-use way. The procedures implemented are based on familiar approaches to modeling topics such as preprocessing, modeling, and postprocessing. The tool can be used by researchers or analysts who are not familiar with the R environment. We demonstrated the application by reviewing the literature published in the last three decades on the species Oreochromis niloticus . In total we reviewed 6196 abstracts of articles recorded in Scopus. LDAShiny allowed us to create the matrix of terms and documents. In the preprocessing phase it went from 530,143 unique terms to 3268. Thus, with the implemented options the number of unique terms was reduced, as well as the computational needs. The results showed that 14 topics were sufficient to describe the corpus of the example used in the demonstration. We also found that the general research topics on this species were related to growth performance, body weight, heavy metals, genetics and water quality, among others.

Suggested Citation

  • Javier De la Hoz-M & Mª José Fernández-Gómez & Susana Mendes, 2021. "LDAShiny: An R Package for Exploratory Review of Scientific Literature Based on a Bayesian Probabilistic Model and Machine Learning Tools," Mathematics, MDPI, vol. 9(14), pages 1-21, July.
  • Handle: RePEc:gam:jmathe:v:9:y:2021:i:14:p:1671-:d:595258
    as

    Download full text from publisher

    File URL: https://www.mdpi.com/2227-7390/9/14/1671/pdf
    Download Restriction: no

    File URL: https://www.mdpi.com/2227-7390/9/14/1671/
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Grimmer, Justin, 2010. "A Bayesian Hierarchical Topic Model for Political Texts: Measuring Expressed Agendas in Senate Press Releases," Political Analysis, Cambridge University Press, vol. 18(1), pages 1-35, January.
    2. Grün, Bettina & Hornik, Kurt, 2011. "topicmodels: An R Package for Fitting Topic Models," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 40(i13).
    3. Anne-Wil Harzing & Satu Alakangas, 2016. "Google Scholar, Scopus and the Web of Science: a longitudinal and cross-disciplinary comparison," Scientometrics, Springer;Akadémiai Kiadó, vol. 106(2), pages 787-804, February.
    4. Stefano Sbalchiero & Maciej Eder, 2020. "Topic modeling, long texts and the best number of topics. Some Problems and solutions," Quality & Quantity: International Journal of Methodology, Springer, vol. 54(4), pages 1095-1108, August.
    5. Scott Deerwester & Susan T. Dumais & George W. Furnas & Thomas K. Landauer & Richard Harshman, 1990. "Indexing by latent semantic analysis," Journal of the American Society for Information Science, Association for Information Science & Technology, vol. 41(6), pages 391-407, September.
    6. Grimmer, Justin & Stewart, Brandon M., 2013. "Text as Data: The Promise and Pitfalls of Automatic Content Analysis Methods for Political Texts," Political Analysis, Cambridge University Press, vol. 21(3), pages 267-297, July.
    7. Denny, Matthew J. & Spirling, Arthur, 2018. "Text Preprocessing For Unsupervised Learning: Why It Matters, When It Misleads, And What To Do About It," Political Analysis, Cambridge University Press, vol. 26(2), pages 168-189, April.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Luis Pilacuan-Bonete & Purificación Galindo-Villardón & Francisco Delgado-Álvarez, 2022. "HJ-Biplot as a Tool to Give an Extra Analytical Boost for the Latent Dirichlet Assignment (LDA) Model: With an Application to Digital News Analysis about COVID-19," Mathematics, MDPI, vol. 10(14), pages 1-17, July.
    2. Alekh Gour & Shikha Aggarwal & Subodha Kumar, 2022. "Lending ears to unheard voices: An empirical analysis of user‐generated content on social media," Production and Operations Management, Production and Operations Management Society, vol. 31(6), pages 2457-2476, June.
    3. Max Weber & Taha Chaiechi & Rabiul Beg, 2022. "Inclusive Growth and Climate Change Mitigation Programs and Policies in the ASEAN: Fiscal Implications," Bulletin of Applied Economics, Risk Market Journals, vol. 9(2), pages 189-221.
    4. Max Weber & Taha Chaiechi & Rabiul Beg, 2022. "Inclusive Growth and Climate Change Mitigation Programs and Policies in the ASEAN: Fiscal Implications," Bulletin of Applied Economics, Risk Market Journals, vol. 9(2), pages 189-220.
    5. Karime Montes-Escobar & Javier De la Hoz-M & Mónica Daniela Barreiro-Linzán & Carolina Fonseca-Restrepo & Miguel Ángel Lapo-Palacios & Douglas Andrés Verduga-Alcívar & Carlos Alfredo Salas-Macias, 2023. "Trends in Agroforestry Research from 1993 to 2022: A Topic Model Using Latent Dirichlet Allocation and HJ-Biplot," Mathematics, MDPI, vol. 11(10), pages 1-15, May.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Maksym Polyakov & Morteza Chalak & Md. Sayed Iftekhar & Ram Pandit & Sorada Tapsuwan & Fan Zhang & Chunbo Ma, 2018. "Authorship, Collaboration, Topics, and Research Gaps in Environmental and Resource Economics 1991–2015," Environmental & Resource Economics, Springer;European Association of Environmental and Resource Economists, vol. 71(1), pages 217-239, September.
    2. Mohamed M. Mostafa, 2023. "A one-hundred-year structural topic modeling analysis of the knowledge structure of international management research," Quality & Quantity: International Journal of Methodology, Springer, vol. 57(4), pages 3905-3935, August.
    3. Arina Wischnewsky & David‐Jan Jansen & Matthias Neuenkirch, 2021. "Financial stability and the Fed: Evidence from congressional hearings," Economic Inquiry, Western Economic Association International, vol. 59(3), pages 1192-1214, July.
    4. Lino Wehrheim, 2019. "Economic history goes digital: topic modeling the Journal of Economic History," Cliometrica, Springer;Cliometric Society (Association Francaise de Cliométrie), vol. 13(1), pages 83-125, January.
    5. Lehotský, Lukáš & Černoch, Filip & Osička, Jan & Ocelík, Petr, 2019. "When climate change is missing: Media discourse on coal mining in the Czech Republic," Energy Policy, Elsevier, vol. 129(C), pages 774-786.
    6. Sanders James & Lisi Giulio & Schonhardt-Bailey Cheryl, 2017. "Themes and Topics in Parliamentary Oversight Hearings: A New Direction in Textual Data Analysis," Statistics, Politics and Policy, De Gruyter, vol. 8(2), pages 153-194, December.
    7. Justyna Klejdysz & Robin L. Lumsdaine, 2023. "Shifts in ECB Communication: A Textual Analysis of the Press Conference," International Journal of Central Banking, International Journal of Central Banking, vol. 19(2), pages 473-542, June.
    8. Matthew Gentzkow & Bryan T. Kelly & Matt Taddy, 2017. "Text as Data," NBER Working Papers 23276, National Bureau of Economic Research, Inc.
    9. Michal Ovádek & Nicolas Lampach & Arthur Dyevre, 2020. "What’s the talk in Brussels? Leveraging daily news coverage to measure issue attention in the European Union," European Union Politics, , vol. 21(2), pages 204-232, June.
    10. Sanders, James & Lisi, Giulio & Schonhardt-Bailey, Cheryl, 2018. "Themes and topics in parliamentary oversight hearings: a new direction in textual data analysis," LSE Research Online Documents on Economics 87624, London School of Economics and Political Science, LSE Library.
    11. David Bholat & Stephen Hans & Pedro Santos & Cheryl Schonhardt-Bailey, 2015. "Text mining for central banks," Handbooks, Centre for Central Banking Studies, Bank of England, number 33, April.
    12. Shr-Wei Kao & Pin Luarn, 2020. "Topic Modeling Analysis of Social Enterprises: Twitter Evidence," Sustainability, MDPI, vol. 12(8), pages 1-20, April.
    13. Albina Latifi & Viktoriia Naboka-Krell & Peter Tillmann & Peter Winker, 2023. "Fiscal Policy in the Bundestag: Textual Analysis and Macroeconomic Effects," MAGKS Papers on Economics 202307, Philipps-Universität Marburg, Faculty of Business Administration and Economics, Department of Economics (Volkswirtschaftliche Abteilung).
    14. Purwoko Haryadi Santoso & Edi Istiyono & Haryanto & Wahyu Hidayatulloh, 2022. "Thematic Analysis of Indonesian Physics Education Research Literature Using Machine Learning," Data, MDPI, vol. 7(11), pages 1-41, October.
    15. Damani K. White-Lewis & KerryAnn O’Meara & Kiernan Mathews & Nicholas Havey, 2023. "Leaving the Institution or Leaving the Academy? Analyzing the Factors that Faculty Weigh in Actual Departure Decisions," Research in Higher Education, Springer;Association for Institutional Research, vol. 64(3), pages 473-494, May.
    16. Camilla Salvatore & Silvia Biffignandi & Annamaria Bianchi, 2022. "Corporate Social Responsibility Activities Through Twitter: From Topic Model Analysis to Indexes Measuring Communication Characteristics," Social Indicators Research: An International and Interdisciplinary Journal for Quality-of-Life Measurement, Springer, vol. 164(3), pages 1217-1248, December.
    17. Lüdering Jochen & Winker Peter, 2016. "Forward or Backward Looking? The Economic Discourse and the Observed Reality," Journal of Economics and Statistics (Jahrbuecher fuer Nationaloekonomie und Statistik), De Gruyter, vol. 236(4), pages 483-515, August.
    18. Jason Anastasopoulos & George J. Borjas & Gavin G. Cook & Michael Lachanski, 2018. "Job Vacancies, the Beveridge Curve, and Supply Shocks: The Frequency and Content of Help-Wanted Ads in Pre- and Post-Mariel Miami," NBER Working Papers 24580, National Bureau of Economic Research, Inc.
    19. Yang Bao & Anindya Datta, 2014. "Simultaneously Discovering and Quantifying Risk Types from Textual Risk Disclosures," Management Science, INFORMS, vol. 60(6), pages 1371-1391, June.
    20. Cho, Yung-Jan & Fu, Pei-Wen & Wu, Chi-Cheng, 2017. "Popular Research Topics in Marketing Journals, 1995–2014," Journal of Interactive Marketing, Elsevier, vol. 40(C), pages 52-72.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jmathe:v:9:y:2021:i:14:p:1671-:d:595258. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.