IDEAS home Printed from https://ideas.repec.org/a/gam/jsusta/v13y2021i19p10856-d646836.html
   My bibliography  Save this article

Applying Text Mining, Clustering Analysis, and Latent Dirichlet Allocation Techniques for Topic Classification of Environmental Education Journals

Author

Listed:
  • I-Cheng Chang

    (Department of Environmental Engineering, National Ilan University, Yilan 260, Taiwan)

  • Tai-Kuei Yu

    (Department of Business Administration, National Quemoy University, Kinmen 892, Taiwan)

  • Yu-Jie Chang

    (Department of Earth and Life Science, University of Taipei, Taipei 100, Taiwan)

  • Tai-Yi Yu

    (Department of Risk Management and Insurance, Ming Chuan University, Taipei 111, Taiwan)

Abstract

Facing the big data wave, this study applied artificial intelligence to cite knowledge and find a feasible process to play a crucial role in supplying innovative value in environmental education. Intelligence agents of artificial intelligence and natural language processing (NLP) are two key areas leading the trend in artificial intelligence; this research adopted NLP to analyze the research topics of environmental education research journals in the Web of Science (WoS) database during 2011–2020 and interpret the categories and characteristics of abstracts for environmental education papers. The corpus data were selected from abstracts and keywords of research journal papers, which were analyzed with text mining, cluster analysis, latent Dirichlet allocation (LDA), and co-word analysis methods. The decisions regarding the classification of feature words were determined and reviewed by domain experts, and the associated TF-IDF weights were calculated for the following cluster analysis, which involved a combination of hierarchical clustering and K-means analysis. The hierarchical clustering and LDA decided the number of required categories as seven, and the K-means cluster analysis classified the overall documents into seven categories. This study utilized co-word analysis to check the suitability of the K-means classification, analyzed the terms with high TF-IDF wights for distinct K-means groups, and examined the terms for different topics with the LDA technique. A comparison of the results demonstrated that most categories that were recognized with K-means and LDA methods were the same and shared similar words; however, two categories had slight differences. The involvement of field experts assisted with the consistency and correctness of the classified topics and documents.

Suggested Citation

  • I-Cheng Chang & Tai-Kuei Yu & Yu-Jie Chang & Tai-Yi Yu, 2021. "Applying Text Mining, Clustering Analysis, and Latent Dirichlet Allocation Techniques for Topic Classification of Environmental Education Journals," Sustainability, MDPI, vol. 13(19), pages 1-20, September.
  • Handle: RePEc:gam:jsusta:v:13:y:2021:i:19:p:10856-:d:646836
    as

    Download full text from publisher

    File URL: https://www.mdpi.com/2071-1050/13/19/10856/pdf
    Download Restriction: no

    File URL: https://www.mdpi.com/2071-1050/13/19/10856/
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Peter van den Besselaar & Gaston Heimeriks, 2006. "Mapping research topics using word-reference co-occurrences: A method and an exploratory case study," Scientometrics, Springer;Akadémiai Kiadó, vol. 68(3), pages 377-393, September.
    2. Seungsu Paek & Namhyoung Kim, 2021. "Analysis of Worldwide Research Trends on the Impact of Artificial Intelligence in Education," Sustainability, MDPI, vol. 13(14), pages 1-20, July.
    3. Huiyun Zhu & Kecheng Liu, 2021. "Temporal, Spatial, and Socioeconomic Dynamics in Social Media Thematic Emphases during Typhoon Mangkhut," Sustainability, MDPI, vol. 13(13), pages 1-17, July.
    4. Hansu Hwang & SeJin An & Eunchang Lee & Suhyeon Han & Cheon-hwan Lee, 2021. "Cross-Societal Analysis of Climate Change Awareness and Its Relation to SDG 13: A Knowledge Synthesis from Text Mining," Sustainability, MDPI, vol. 13(10), pages 1-21, May.
    5. Sunghae Jun & Sangsung Park & Dongsik Jang, 2015. "A Technology Valuation Model Using Quantitative Patent Analysis: A Case Study of Technology Transfer in Big Data Marketing," Emerging Markets Finance and Trade, Taylor & Francis Journals, vol. 51(5), pages 963-974, September.
    6. Gabjo Kim & Joonhyuck Lee & Dongsik Jang & Sangsung Park, 2016. "Technology Clusters Exploration for Patent Portfolio through Patent Abstract Analysis," Sustainability, MDPI, vol. 8(12), pages 1-13, December.
    7. Diego Corrales-Garay & Eva-María Mora-Valentín & Marta Ortiz-de-Urbina-Criado, 2020. "Entrepreneurship Through Open Data: An Opportunity for Sustainable Development," Sustainability, MDPI, vol. 12(12), pages 1-25, June.
    8. Yen‐Liang Chen & Yi‐Hung Liu & Wu‐Liang Ho, 2013. "A text mining approach to assist the general public in the retrieval of legal documents," Journal of the American Society for Information Science and Technology, Association for Information Science & Technology, vol. 64(2), pages 280-290, February.
    9. A. Christy & G. Meera Gandhi & S. Vaithyasubramanian, 2019. "Clustering of text documents with keyword weighting function," International Journal of Intelligent Enterprise, Inderscience Enterprises Ltd, vol. 6(1), pages 19-31.
    10. Xin Ying An & Qing Qiang Wu, 2011. "Co-word analysis of the trends in stem cells field based on subject heading weighting," Scientometrics, Springer;Akadémiai Kiadó, vol. 88(1), pages 133-144, July.
    11. Ruomu Miao & Yuxia Wang & Shuang Li, 2021. "Analyzing Urban Spatial Patterns and Functional Zones Using Sina Weibo POI Data: A Case Study of Beijing," Sustainability, MDPI, vol. 13(2), pages 1-15, January.
    12. Yen-Liang Chen & Yi-Hung Liu & Wu-Liang Ho, 2013. "A text mining approach to assist the general public in the retrieval of legal documents," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 64(2), pages 280-290, February.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Vrdoljak Ivana, 2023. "Lifelong Education in Economics, Business and Management Research: Literature Review," Business Systems Research, Sciendo, vol. 14(1), pages 153-172, September.
    2. Mini Zhu & Gang Wang & Chaoping Li & Hongjun Wang & Bin Zhang, 2023. "Artificial Intelligence Classification Model for Modern Chinese Poetry in Education," Sustainability, MDPI, vol. 15(6), pages 1-19, March.
    3. Christoph Funk & Elena Tönjes & Ramona Teuber & Lutz Breuer, 2024. "Reading between the lines: The intersection of research attention and sustainable development goals," Sustainable Development, John Wiley & Sons, Ltd., vol. 32(5), pages 4545-4566, October.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Sung Kim & Derek Hansen & Richard Helps, 2018. "Computing research in the academy: insights from theses and dissertations," Scientometrics, Springer;Akadémiai Kiadó, vol. 114(1), pages 135-158, January.
    2. So-Hui Park & Dong-Gu Lee & Jin-Sung Park & Jun-Woo Kim, 2021. "A Survey of Research on Data Analytics-Based Legal Tech," Sustainability, MDPI, vol. 13(14), pages 1-24, July.
    3. Shu Yan & Lizi Pan & Yan Lu & Juan Chen & Ting Zhang & Dongzi Xu & Zhaolian Ouyang, 2023. "Towards Sustainable Drug Supply in China: A Bibliometric Analysis of Drug Reform Policies," Sustainability, MDPI, vol. 15(13), pages 1-20, June.
    4. Raymundo das Neves Machado & Benjamín Vargas-Quesada & Jacqueline Leta, 2016. "Intellectual structure in stem cell research: exploring Brazilian scientific articles from 2001 to 2010," Scientometrics, Springer;Akadémiai Kiadó, vol. 106(2), pages 525-537, February.
    5. Jan M. Gerken & Martin G. Moehrle, 2012. "A new instrument for technology monitoring: novelty in patents measured by semantic patent analysis," Scientometrics, Springer;Akadémiai Kiadó, vol. 91(3), pages 645-670, June.
    6. Shi Shen & Ke Shi & Junwang Huang & Changxiu Cheng & Min Zhao, 2023. "Global online social response to a natural disaster and its influencing factors: a case study of Typhoon Haiyan," Humanities and Social Sciences Communications, Palgrave Macmillan, vol. 10(1), pages 1-15, December.
    7. Michel Zitt, 2015. "Meso-level retrieval: IR-bibliometrics interplay and hybrid citation-words methods in scientific fields delineation," Scientometrics, Springer;Akadémiai Kiadó, vol. 102(3), pages 2223-2245, March.
    8. Zhichao Ba & Yujie Cao & Jin Mao & Gang Li, 2019. "A hierarchical approach to analyzing knowledge integration between two fields—a case study on medical informatics and computer science," Scientometrics, Springer;Akadémiai Kiadó, vol. 119(3), pages 1455-1486, June.
    9. Milojević, Staša & Sugimoto, Cassidy R. & Larivière, Vincent & Thelwall, Mike & Ding, Ying, 2014. "The role of handbooks in knowledge creation and diffusion: A case of science and technology studies," Journal of Informetrics, Elsevier, vol. 8(3), pages 693-709.
    10. Luciano Barcellos-Paula & Anna María Gil-Lafuente & Aline Castro-Rezende, 2023. "Algorithm Applied to SDG13: A Case Study of Ibero-American Countries," Mathematics, MDPI, vol. 11(2), pages 1-20, January.
    11. S. Ravikumar & Ashutosh Agrahari & S. N. Singh, 2015. "Mapping the intellectual structure of scientometrics: a co-word analysis of the journal Scientometrics (2005–2010)," Scientometrics, Springer;Akadémiai Kiadó, vol. 102(1), pages 929-955, January.
    12. Tangfei Xiong & Jianjun Zhang & Huiyan Huang, 2023. "Entrepreneurship Education for Training the Talent in China: Exploring the Influencing Factors and Their Effects," Sustainability, MDPI, vol. 15(15), pages 1-23, July.
    13. Joana Costa & Luís Carvalho, 2022. "Is digital government facilitating entrepreneurship? A comparative statics analysis," GEE Papers 0164, Gabinete de Estratégia e Estudos, Ministério da Economia, revised Jun 2022.
    14. Hanane Rhomad & Karima Khalil & Khalid Elkalay, 2023. "Water Quality Modeling in Atlantic Region: Review, Science Mapping and Future Research Directions," Water Resources Management: An International Journal, Published for the European Water Resources Association (EWRA), Springer;European Water Resources Association (EWRA), vol. 37(1), pages 451-499, January.
    15. Zhenwei Wang & Xiaochun Wang & Zijin Dong & Lisan Li & Wangjun Li & Shicheng Li, 2023. "More Urban Elderly Care Facilities Should Be Placed in Densely Populated Areas for an Aging Wuhan of China," Land, MDPI, vol. 12(1), pages 1-13, January.
    16. Nezha Mejjad & Marzia Rovere, 2021. "Understanding the Impacts of Blue Economy Growth on Deep-Sea Ecosystem Services," Sustainability, MDPI, vol. 13(22), pages 1-26, November.
    17. Pranpreya Sriwannawit & Ulf Sandström, 2015. "Large-scale bibliometric review of diffusion research," Scientometrics, Springer;Akadémiai Kiadó, vol. 102(2), pages 1615-1645, February.
    18. Jianhua Hou & Xiucai Yang & Chaomei Chen, 2018. "Emerging trends and new developments in information science: a document co-citation analysis (2009–2016)," Scientometrics, Springer;Akadémiai Kiadó, vol. 115(2), pages 869-892, May.
    19. Tinggui Chen & Lijuan Peng & Jianjun Yang & Guodong Cong & Guoping Li, 2021. "Evolutionary Game of Multi-Subjects in Live Streaming and Governance Strategies Based on Social Preference Theory during the COVID-19 Pandemic," Mathematics, MDPI, vol. 9(21), pages 1-41, October.
    20. Hyundong Nam & Taewoo Nam, 2021. "Exploring Strategic Directions of Pandemic Crisis Management: A Text Analysis of World Economic Forum COVID-19 Reports," Sustainability, MDPI, vol. 13(8), pages 1-19, April.

    More about this item

    Keywords

    ;
    ;
    ;
    ;

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jsusta:v:13:y:2021:i:19:p:10856-:d:646836. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.