IDEAS home Printed from https://ideas.repec.org/a/gam/jsusta/v11y2019i1p196-d194504.html
   My bibliography  Save this article

SocialTERM-Extractor: Identifying and Predicting Social-Problem-Specific Key Noun Terms from a Large Number of Online News Articles Using Text Mining and Machine Learning Techniques

Author

Listed:
  • Jong Hwan Suh

    () (Department of Management Information Systems, BERI, Gyeongsang National University, 501 Jinjudae-ro Jinju-si, Gyeongsangnam-do 52828, Korea)

Abstract

In the digital age, the abundant unstructured data on the Internet, particularly online news articles, provide opportunities for identifying social problems and understanding social systems for sustainability. However, the previous works have not paid attention to the social-problem-specific perspectives of such big data, and it is currently unclear how information technologies can use the big data to identify and manage the ongoing social problems. In this context, this paper introduces and focuses on social-problem-specific key noun terms, namely SocialTERMs, which can be used not only to search the Internet for social-problem-related data, but also to monitor the ongoing and future events of social problems. Moreover, to alleviate time-consuming human efforts in identifying the SocialTERMs, this paper designs and examines the SocialTERM-Extractor, which is an automatic approach for identifying the key noun terms of social-problem-related topics, namely SPRTs, in a large number of online news articles and predicting the SocialTERMs among the identified key noun terms. This paper has its novelty as the first trial to identify and predict the SocialTERMs from a large number of online news articles, and it contributes to literature by proposing three types of text-mining-based features, namely temporal weight, sentiment, and complex network structural features, and by comparing the performances of such features with various machine learning techniques including deep learning. Particularly, when applied to a large number of online news articles that had been published in South Korea over a 12-month period and mostly written in Korean, the experimental results showed that Boosting Decision Tree gave the best performances with the full feature sets. They showed that the SocialTERMs can be predicted with high performances by the proposed SocialTERM-Extractor. Eventually, this paper can be beneficial for individuals or organizations who want to explore and use social-problem-related data in a systematical manner for understanding and managing social problems even though they are unfamiliar with ongoing social problems.

Suggested Citation

  • Jong Hwan Suh, 2019. "SocialTERM-Extractor: Identifying and Predicting Social-Problem-Specific Key Noun Terms from a Large Number of Online News Articles Using Text Mining and Machine Learning Techniques," Sustainability, MDPI, Open Access Journal, vol. 11(1), pages 1-44, January.
  • Handle: RePEc:gam:jsusta:v:11:y:2019:i:1:p:196-:d:194504
    as

    Download full text from publisher

    File URL: https://www.mdpi.com/2071-1050/11/1/196/pdf
    Download Restriction: no

    File URL: https://www.mdpi.com/2071-1050/11/1/196/
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Yongho Lee & So Young Kim & Inseok Song & Yongtae Park & Juneseuk Shin, 2014. "Technology opportunity identification customized to the technological capability of SMEs through two-stage patent analysis," Scientometrics, Springer;Akadémiai Kiadó, vol. 100(1), pages 227-244, July.
    2. Angel Conde & Mikel Larrañaga & Ana Arruarte & Jon A. Elorriaga & Dan Roth, 2016. "litewi: A combined term extraction and entity linking method for eliciting educational ontologies from textbooks," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 67(2), pages 380-399, February.
    3. Suh, Jong Hwan, 2015. "Forecasting the daily outbreak of topic-level political risk from social media using hidden Markov model-based techniques," Technological Forecasting and Social Change, Elsevier, vol. 94(C), pages 115-132.
    4. Benjamin Van Roy & Xiang Yan, 2010. "Manipulation Robustness of Collaborative Filtering," Management Science, INFORMS, vol. 56(11), pages 1911-1929, November.
    5. Zhang, Yi & Porter, Alan L. & Hu, Zhengyin & Guo, Ying & Newman, Nils C., 2014. "“Term clumping” for technical intelligence: A case study on dye-sensitized solar cells," Technological Forecasting and Social Change, Elsevier, vol. 85(C), pages 26-39.
    6. Erjia Yan & Ying Ding, 2009. "Applying centrality measures to impact analysis: A coauthorship network analysis," Journal of the American Society for Information Science and Technology, Association for Information Science & Technology, vol. 60(10), pages 2107-2118, October.
    7. Xianshu Zhu & Tim Oates, 2014. "Finding story chains in newswire articles using random walks," Information Systems Frontiers, Springer, vol. 16(5), pages 753-769, November.
    8. Yan Dang & Yulei Zhang & Hsinchun Chen & Paul Jen‐Hwa Hu & Susan A. Brown & Cathy Larson, 2009. "Arizona Literature Mapper: An integrated approach to monitor and analyze global bioterrorism research literature," Journal of the American Society for Information Science and Technology, Association for Information Science & Technology, vol. 60(7), pages 1466-1485, July.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Boram Choi & Jong Hwan Suh, 2020. "Forecasting Spare Parts Demand of Military Aircraft: Comparisons of Data Mining Techniques and Managerial Features from the Case of South Korea," Sustainability, MDPI, Open Access Journal, vol. 12(15), pages 1-20, July.
    2. Samuel Zanferdini Oliva & Livia Oliveira-Ciabati & Denise Gazotto Dezembro & Mário Sérgio Adolfi Júnior & Maísa Carvalho Silva & Hugo Cesar Pessotti & Juliana Tarossi Pollettini, 2021. "Text structuring methods based on complex network: a systematic review," Scientometrics, Springer;Akadémiai Kiadó, vol. 126(2), pages 1471-1493, February.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Xiao Zhou & Lu Huang & Yi Zhang & Miaomiao Yu, 2019. "A hybrid approach to detecting technological recombination based on text mining and patent network analysis," Scientometrics, Springer;Akadémiai Kiadó, vol. 121(2), pages 699-737, November.
    2. Elsa Alvaro & Angel Yanguas-Gil, 2018. "Characterizing the field of Atomic Layer Deposition: Authors, topics, and collaborations," PLOS ONE, Public Library of Science, vol. 13(1), pages 1-19, January.
    3. Kyuwoong Kim & Kyeongmin Park & Sungjoo Lee, 2019. "Investigating technology opportunities: the use of SAOx analysis," Scientometrics, Springer;Akadémiai Kiadó, vol. 118(1), pages 45-70, January.
    4. Ying Huang & Donghua Zhu & Yue Qian & Yi Zhang & Alan L. Porter & Yuqin Liu & Ying Guo, 2017. "A hybrid method to trace technology evolution pathways: a case study of 3D printing," Scientometrics, Springer;Akadémiai Kiadó, vol. 111(1), pages 185-204, April.
    5. Yichi Zhang & Zhiliang Dong & Sen Liu & Peixiang Jiang & Cuizhi Zhang & Chao Ding, 2021. "Forecast of International Trade of Lithium Carbonate Products in Importing Countries and Small-Scale Exporting Countries," Sustainability, MDPI, Open Access Journal, vol. 13(3), pages 1-23, January.
    6. Chao Lu & Yingyi Zhang & Yong‐Yeol Ahn & Ying Ding & Chenwei Zhang & Dandan Ma, 2020. "Co‐contributorship network and division of labor in individual scientific collaborations," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 71(10), pages 1162-1178, October.
    7. Nasirian, Farzaneh & Mahdavi Pajouh, Foad & Balasundaram, Balabhaskar, 2020. "Detecting a most closeness-central clique in complex networks," European Journal of Operational Research, Elsevier, vol. 283(2), pages 461-475.
    8. Zhang, Yi & Robinson, Douglas K.R. & Porter, Alan L. & Zhu, Donghua & Zhang, Guangquan & Lu, Jie, 2016. "Technology roadmapping for competitive technical intelligence," Technological Forecasting and Social Change, Elsevier, vol. 110(C), pages 175-186.
    9. Farrukh, Clare & Holgado, Maria, 2020. "Integrating sustainable value thinking into technology forecasting: A configurable toolset for early stage technology assessment," Technological Forecasting and Social Change, Elsevier, vol. 158(C).
    10. Ma, Jing & Abrams, Natalie F. & Porter, Alan L. & Zhu, Donghua & Farrell, Dorothy, 2019. "Identifying translational indicators and technology opportunities for nanomedical research using tech mining: The case of gold nanostructures," Technological Forecasting and Social Change, Elsevier, vol. 146(C), pages 767-775.
    11. Chengcui Zhang & Elisa Bertino & Bhavani Thuraisingham & James Joshi, 2014. "Guest editorial: Information reuse, integration, and reusable systems," Information Systems Frontiers, Springer, vol. 16(5), pages 749-752, November.
    12. Zhang, Yi & Lu, Jie & Liu, Feng & Liu, Qian & Porter, Alan & Chen, Hongshu & Zhang, Guangquan, 2018. "Does deep learning help topic extraction? A kernel k-means clustering method with word embedding," Journal of Informetrics, Elsevier, vol. 12(4), pages 1099-1117.
    13. Okazaki, Shintaro & Plangger, Kirk & West, Douglas & Menéndez, Héctor D., 2020. "Exploring digital corporate social responsibility communications on Twitter," Journal of Business Research, Elsevier, vol. 117(C), pages 675-682.
    14. Claudio Biscaro & Carlo Giupponi, 2014. "Co-Authorship and Bibliographic Coupling Network Effects on Citations," PLOS ONE, Public Library of Science, vol. 9(6), pages 1-12, June.
    15. Way-Ren Huang & Chia-Jen Hsieh & Ke-Chiun Chang & Yen-Jo Kiang & Chien-Chung Yuan & Woei-Chyn Chu, 2017. "Network characteristics and patent value—Evidence from the Light-Emitting Diode industry," PLOS ONE, Public Library of Science, vol. 12(8), pages 1-14, August.
    16. Caroline V Fry & Xiaojing Cai & Yi Zhang & Caroline S Wagner, 2020. "Consolidation in a crisis: Patterns of international collaboration in early COVID-19 research," PLOS ONE, Public Library of Science, vol. 15(7), pages 1-15, July.
    17. Sun, Bixuan & Kolesnikov, Sergey & Goldstein, Anna & Chan, Gabriel, 2021. "A dynamic approach for identifying technological breakthroughs with an application in solar photovoltaics," Technological Forecasting and Social Change, Elsevier, vol. 165(C).
    18. Shahadat Uddin & Liaquat Hossain & Kim Rasmussen, 2013. "Network Effects on Scientific Collaborations," PLOS ONE, Public Library of Science, vol. 8(2), pages 1-12, February.
    19. Noh, Heeyong & Kim, Kyuwoong & Song, Young-Keun & Lee, Sungjoo, 2021. "Opportunity-driven technology roadmapping: The case of 5G mobile services," Technological Forecasting and Social Change, Elsevier, vol. 163(C).
    20. Yi Zhang & Yue Qian & Ying Huang & Ying Guo & Guangquan Zhang & Jie Lu, 2017. "An entropy-based indicator system for measuring the potential of patents in technological innovation: rejecting moderation," Scientometrics, Springer;Akadémiai Kiadó, vol. 111(3), pages 1925-1946, June.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jsusta:v:11:y:2019:i:1:p:196-:d:194504. See general information about how to correct material in RePEc.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: (XML Conversion Team). General contact details of provider: https://www.mdpi.com/ .

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service hosted by the Research Division of the Federal Reserve Bank of St. Louis . RePEc uses bibliographic data supplied by the respective publishers.