IDEAS home Printed from https://ideas.repec.org/a/gam/jsusta/v15y2023i4p3402-d1066965.html
   My bibliography  Save this article

An Improved Corpus-Based NLP Method for Facilitating Keyword Extraction: An Example of the COVID-19 Vaccine Hesitancy Corpus

Author

Listed:
  • Liang-Ching Chen

    (Department of Foreign Languages, R.O.C. Military Academy, Kaohsiung 830, Taiwan
    Institute of Education, National Sun Yat-sen University, Kaohsiung 804, Taiwan)

Abstract

In the current COVID-19 post-pandemic era, COVID-19 vaccine hesitancy is hindering the herd immunity generated by widespread vaccination. It is critical to identify the factors that may cause COVID-19 vaccine hesitancy, enabling the relevant authorities to propose appropriate interventions for mitigating such a phenomenon. Keyword extraction, a sub-field of natural language processing (NLP) applications, plays a vital role in modern medical informatics. When traditional corpus-based NLP methods are used to conduct keyword extraction, they only consider a word’s log-likelihood value to determine whether it is a keyword, which leaves room for concerns about the efficiency and accuracy of this keyword extraction technique. These concerns include the fact that the method is unable to (1) optimize the keyword list by the machine-based approach, (2) effectively evaluate the keyword’s importance level, and (3) integrate the variables to conduct data clustering. Thus, to address the aforementioned issues, this study integrated a machine-based word removal technique, the i10-index, and the importance–performance analysis (IPA) technique to develop an improved corpus-based NLP method for facilitating keyword extraction. The top 200 most-cited Science Citation Index (SCI) research articles discussing COVID-19 vaccine hesitancy were adopted as the target corpus for verification. The results showed that the keywords of Quadrant I ( n = 98) reached the highest lexical coverage (9.81%), indicating that the proposed method successfully identified and extracted the most important keywords from the target corpus, thus achieving more domain-oriented and accurate keyword extraction results.

Suggested Citation

  • Liang-Ching Chen, 2023. "An Improved Corpus-Based NLP Method for Facilitating Keyword Extraction: An Example of the COVID-19 Vaccine Hesitancy Corpus," Sustainability, MDPI, vol. 15(4), pages 1-19, February.
  • Handle: RePEc:gam:jsusta:v:15:y:2023:i:4:p:3402-:d:1066965
    as

    Download full text from publisher

    File URL: https://www.mdpi.com/2071-1050/15/4/3402/pdf
    Download Restriction: no

    File URL: https://www.mdpi.com/2071-1050/15/4/3402/
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Marcin Kozak & Lutz Bornmann, 2012. "A New Family of Cumulative Indexes for Measuring Scientific Performance," PLOS ONE, Public Library of Science, vol. 7(10), pages 1-4, October.
    2. Phadermrod, Boonyarat & Crowder, Richard M. & Wills, Gary B., 2019. "Importance-Performance Analysis based SWOT analysis," International Journal of Information Management, Elsevier, vol. 44(C), pages 194-203.
    3. Ida Rašovská & Marketa Kubickova & Kateřina Ryglová, 2021. "Importance–performance analysis approach to destination management," Tourism Economics, , vol. 27(4), pages 777-794, June.
    4. Jung-Fa Tsai & Chin-Po Wang & Kuei-Lun Chang & Yi-Chung Hu, 2021. "Selecting Bloggers for Hotels via an Innovative Mixed MCDM Model," Mathematics, MDPI, vol. 9(13), pages 1-15, July.
    5. Yi-Fang Luo & Heng-Yu Shen & Shu-Ching Yang & Liang-Ching Chen, 2021. "The Relationships among Anxiety, Subjective Well-Being, Media Consumption, and Safety-Seeking Behaviors during the COVID-19 Epidemic," IJERPH, MDPI, vol. 18(24), pages 1-12, December.
    6. Yaoping Zhong & Wenzhong Zhu & Yingying Zhou, 2020. "CSR Image Construction of Chinese Construction Enterprises in Africa Based on Data Mining and Corpus Analysis," Mathematical Problems in Engineering, Hindawi, vol. 2020, pages 1-14, July.
    7. Zhong-Lei Wang & Hou-Cai Shen & Jian Zuo, 2019. "Risks in Prefabricated Buildings in China: Importance-Performance Analysis Approach," Sustainability, MDPI, vol. 11(12), pages 1-13, June.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Zheng Yuan & Baohua Wen & Cheng He & Jin Zhou & Zhonghua Zhou & Feng Xu, 2022. "Application of Multi-Criteria Decision-Making Analysis to Rural Spatial Sustainability Evaluation: A Systematic Review," IJERPH, MDPI, vol. 19(11), pages 1-31, May.
    2. Fei-Hsin Huang & Hann Nguyen, 2022. "Selecting Optimal Cultural Tourism for Indigenous Tribes by Fuzzy MCDM," Mathematics, MDPI, vol. 10(17), pages 1-12, August.
    3. Esmailpour, Javad & Aghabayk, Kayvan & Abrari Vajari, Mohammad & De Gruyter, Chris, 2020. "Importance – Performance Analysis (IPA) of bus service attributes: A case study in a developing country," Transportation Research Part A: Policy and Practice, Elsevier, vol. 142(C), pages 129-150.
    4. Uroš Kramar & Dejan Dragan & Darja Topolšek, 2019. "The Holistic Approach to Urban Mobility Planning with a Modified Focus Group, SWOT, and Fuzzy Analytical Hierarchical Process," Sustainability, MDPI, vol. 11(23), pages 1-29, November.
    5. Carla M. A. Pinto & Jorge Mendonça & Lurdes Babo & Francisco J. G. Silva & José L. R. Fernandes, 2022. "Analyzing the Implementation of Lean Methodologies and Practices in the Portuguese Industry: A Survey," Sustainability, MDPI, vol. 14(3), pages 1-24, February.
    6. Macias, A. & Kandidayeni, M. & Boulon, L. & Trovão, J.P., 2021. "Fuel cell-supercapacitor topologies benchmark for a three-wheel electric vehicle powertrain," Energy, Elsevier, vol. 224(C).
    7. Prince Donkor Ameyaw & Walter Timo de Vries, 2021. "Toward Smart Land Management: Land Acquisition and the Associated Challenges in Ghana. A Look into a Blockchain Digital Land Registry for Prospects," Land, MDPI, vol. 10(3), pages 1-22, March.
    8. Poponi, Stefano & Piovesan, Gianluca & Fulco, Irene & Vessella, Federico, 2022. "Geolocation of mountain businesses: Identifying and characterizing clusters by altitude in the Central Apennines," Land Use Policy, Elsevier, vol. 120(C).
    9. Al-Hussein M. H. Al-Aidrous & Nasir Shafiq & Yasser Yahya Al-Ashmori & Al-Baraa Abdulrahman Al-Mekhlafi & Abdullah O. Baarimah, 2022. "Essential Factors Enhancing Industrialized Building Implementation in Malaysian Residential Projects," Sustainability, MDPI, vol. 14(18), pages 1-18, September.
    10. Lorena Bašan & Jelena Kapeš & Ana Kamenečki, 2021. "Tourist Satisfaction as a Driver of Destination Marketing Improvements: The Case of the Opatija Riviera," Tržište/Market, Faculty of Economics and Business, University of Zagreb, vol. 33(1), pages 93-112.
    11. Boonlert Jitmaneeroj, 2024. "Value relevance of multifaceted corporate social performance: how do country-specific factors matter?," Palgrave Communications, Palgrave Macmillan, vol. 11(1), pages 1-20, December.
    12. Ayaz Ahmad Khan & Rongrong Yu & Tingting Liu & Ning Gu & James Walsh, 2023. "Volumetric Modular Construction Risks: A Comprehensive Review and Digital-Technology-Coupled Circular Mitigation Strategies," Sustainability, MDPI, vol. 15(8), pages 1-34, April.
    13. Budsaratragoon, Pornanong & Jitmaneeroj, Boonlert, 2020. "A critique on the Corruption Perceptions Index: An interdisciplinary approach," Socio-Economic Planning Sciences, Elsevier, vol. 70(C).
    14. Sandro Serpa & Carlos Miguel Ferreira & Maria José Sá, 2020. "The Potential of Organisations’ SWOT Diagnostic Assessment," Academic Journal of Interdisciplinary Studies, Richtmann Publishing Ltd, vol. 9, July.
    15. Thang Quyet Nguyen & Lan Thi Tuyet Ngo & Nguyen Tan Huynh & Thanh Le Quoc & Long Van Hoang, 2022. "Assessing port service quality: An application of the extension fuzzy AHP and importance-performance analysis," PLOS ONE, Public Library of Science, vol. 17(2), pages 1-24, February.
    16. Lamrhari, Soumaya & Ghazi, Hamid El & Oubrich, Mourad & Faker, Abdellatif El, 2022. "A social CRM analytic framework for improving customer retention, acquisition, and conversion," Technological Forecasting and Social Change, Elsevier, vol. 174(C).
    17. Qiuyu Wang & Zhiqi Gong & Chengkui Liu, 2022. "Risk Network Evaluation of Prefabricated Building Projects in Underdeveloped Areas: A Case Study in Qinghai," Sustainability, MDPI, vol. 14(10), pages 1-26, May.
    18. Yang Liu & Jianjun Dong & Ling Shen, 2020. "A Conceptual Development Framework for Prefabricated Construction Supply Chain Management: An Integrated Overview," Sustainability, MDPI, vol. 12(5), pages 1-29, March.
    19. Jingda Ding & Chao Liu & Goodluck Asobenie Kandonga, 2020. "Exploring the limitations of the h-index and h-type indexes in measuring the research performance of authors," Scientometrics, Springer;Akadémiai Kiadó, vol. 122(3), pages 1303-1322, March.
    20. Chengcai Tang & Qianqian Zheng & Pin Ng, 2019. "A Study on the Coordinative Green Development of Tourist Experience and Commercialization of Tourism at Cultural Heritage Sites," Sustainability, MDPI, vol. 11(17), pages 1-19, August.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jsusta:v:15:y:2023:i:4:p:3402-:d:1066965. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.