IDEAS home Printed from https://ideas.repec.org/a/gam/jmathe/v9y2021i17p2041-d621437.html
   My bibliography  Save this article

A Novel Framework for Mining Social Media Data Based on Text Mining, Topic Modeling, Random Forest, and DANP Methods

Author

Listed:
  • Chi-Yo Huang

    (Department of Industrial Education, National Taiwan Normal University, Taipei 106, Taiwan)

  • Chia-Lee Yang

    (National Center for High-Performance Computing, Hsinchu 300, Taiwan)

  • Yi-Hao Hsiao

    (National Center for High-Performance Computing, Hsinchu 300, Taiwan)

Abstract

The huge volume of user-generated data on social media is the result of the aggregation of users’ personal backgrounds, past experiences, and daily activities. This huge size of the generated data, the so-called “big data,” has been studied and investigated intensively during the past few years. In spite of the impression one may get from the media, a great deal of data processing has not been uncovered by existing techniques of data engineering and processing. However, very few scholars have tried to do so, especially from the perspective of multiple-criteria decision-making (MCDM). These MCDM methods can derive influence relationships and weights associated with aspects and criteria, which can hardly be achieved by traditional data analytics and statistical approaches. Therefore, in this paper, we aim to propose an analytic framework to mine social networks, feed the meaningful information via MCDM methods based on a theoretical framework, derive causal relationships among the aspects of the theoretical framework, and finally compare the causal relationships with a social theory. Latent Dirichlet allocation (LDA) will be adopted to derive topic models based on the data retrieved from social media. By clustering the topics into aspects of the social theory, the probability associated with each aspect will be normalized and then transformed to a Likert-type 5-point scale. Afterwards, for every topic, the feature importance of all other topics will be derived using the random forest (RF) algorithm. The feature importance matrix will be transformed to the initial influence matrix of the decision-making trial and evaluation laboratory (DEMATEL). The influence relationships among the aspects and criteria and influence weights can then be derived by using the DEMATEL-based analytic network process (DANP). The influence weight versus each criterion can be derived by using DANP. To verify the feasibility of the proposed framework, Taiwanese users’ attitudes toward air pollution will be analyzed based on the value–belief–norm (VBN) theory by using social media data retrieved from Dcard (dcard.tw). Based on the analytic results, the causal relationships are fully consistent with the VBN framework. Further, the mutual influences derived in this work that were seldom discussed by earlier works, i.e., the mutual influences between altruistic concerns and egoistic concerns, as well as those between altruistic concerns and biosphere concerns, are worth further investigation in future.

Suggested Citation

  • Chi-Yo Huang & Chia-Lee Yang & Yi-Hao Hsiao, 2021. "A Novel Framework for Mining Social Media Data Based on Text Mining, Topic Modeling, Random Forest, and DANP Methods," Mathematics, MDPI, vol. 9(17), pages 1-21, August.
  • Handle: RePEc:gam:jmathe:v:9:y:2021:i:17:p:2041-:d:621437
    as

    Download full text from publisher

    File URL: https://www.mdpi.com/2227-7390/9/17/2041/pdf
    Download Restriction: no

    File URL: https://www.mdpi.com/2227-7390/9/17/2041/
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Kaplan, Andreas M. & Haenlein, Michael, 2010. "Users of the world, unite! The challenges and opportunities of Social Media," Business Horizons, Elsevier, vol. 53(1), pages 59-68, January.
    2. Erik Brynjolfsson & Kristina McElheran, 2016. "The Rapid Adoption of Data-Driven Decision-Making," American Economic Review, American Economic Association, vol. 106(5), pages 133-139, May.
    3. Mei Yang & Shah Nazir & Qingshan Xu & Shaukat Ali, 2020. "Deep Learning Algorithms and Multicriteria Decision-Making Used in Big Data: A Systematic Literature Review," Complexity, Hindawi, vol. 2020, pages 1-18, August.
    4. Yasmin, Mariam & Tatoglu, Ekrem & Kilic, Huseyin Selcuk & Zaim, Selim & Delen, Dursun, 2020. "Big data analytics capabilities and firm performance: An integrated MCDM approach," Journal of Business Research, Elsevier, vol. 114(C), pages 1-15.
    5. Chao Fu & Weiyong Liu & Wenjun Chang, 2020. "Data-driven multiple criteria decision making for diagnosis of thyroid cancer," Annals of Operations Research, Springer, vol. 293(2), pages 833-862, October.
    6. Nguyen, The Ninh & Lobo, Antonio & Greenland, Steven, 2016. "Pro-environmental purchase behaviour: The role of consumers' biospheric values," Journal of Retailing and Consumer Services, Elsevier, vol. 33(C), pages 98-108.
    7. Kietzmann, Jan H. & Hermkens, Kristopher & McCarthy, Ian P. & Silvestre, Bruno S., 2011. "Social media? Get serious! Understanding the functional building blocks of social media," Business Horizons, Elsevier, vol. 54(3), pages 241-251, May.
    8. Chia-Lee Yang & Ming-Chang Shieh & Chi-Yo Huang & Ching-Pin Tung, 2018. "A Derivation of Factors Influencing the Successful Integration of Corporate Volunteers into Public Flood Disaster Inquiry and Notification Systems," Sustainability, MDPI, vol. 10(6), pages 1-31, June.
    9. Gérard Biau & Erwan Scornet, 2016. "A random forest guided tour," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 25(2), pages 197-227, June.
    10. Gwo-Hshiung Tzeng & Chi-Yo Huang, 2012. "Combined DEMATEL technique with hybrid MCDM methods for creating the aspired intelligent global manufacturing & logistics systems," Annals of Operations Research, Springer, vol. 197(1), pages 159-190, August.
    11. Gérard Biau & Erwan Scornet, 2016. "Rejoinder on: A random forest guided tour," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 25(2), pages 264-268, June.
    12. Liu, Chui-Hua & Tzeng, Gwo-Hshiung & Lee, Ming-Huei, 2012. "Improving tourism policy implementation – The use of hybrid MCDM models," Tourism Management, Elsevier, vol. 33(2), pages 413-426.
    13. Lo, Huai-Wei & Liou, James J.H. & Huang, Chun-Nen & Chuang, Yen-Ching & Tzeng, Gwo-Hshiung, 2020. "A new soft computing approach for analyzing the influential relationships of critical infrastructures," International Journal of Critical Infrastructure Protection, Elsevier, vol. 28(C).
    14. Chi-Yo Huang & Pei-Han Chung & Joseph Z. Shyu & Yao-Hua Ho & Chao-Hsin Wu & Ming-Che Lee & Ming-Jenn Wu, 2018. "Evaluation and Selection of Materials for Particulate Matter MEMS Sensors by Using Hybrid MCDM Methods," Sustainability, MDPI, vol. 10(10), pages 1-35, September.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Chi-Yo Huang & Liang-Chieh Wang & Ying-Ting Kuo & Wei-Ti Huang, 2021. "A Novel Analytic Framework of Technology Mining Using the Main Path Analysis and the Decision-Making Trial and Evaluation Laboratory-Based Analytic Network Process," Mathematics, MDPI, vol. 9(19), pages 1-24, October.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Chi-Yo Huang & Min-Jen Yang & Jeen-Fong Li & Hueiling Chen, 2021. "A DANP-Based NDEA-MOP Approach to Evaluating the Patent Commercialization Performance of Industry–Academic Collaborations," Mathematics, MDPI, vol. 9(18), pages 1-26, September.
    2. Chi-Yo Huang & Pei-Han Chung & Joseph Z. Shyu & Yao-Hua Ho & Chao-Hsin Wu & Ming-Che Lee & Ming-Jenn Wu, 2018. "Evaluation and Selection of Materials for Particulate Matter MEMS Sensors by Using Hybrid MCDM Methods," Sustainability, MDPI, vol. 10(10), pages 1-35, September.
    3. Hassani, Abdeslam & Mosconi, Elaine, 2022. "Social media analytics, competitive intelligence, and dynamic capabilities in manufacturing SMEs," Technological Forecasting and Social Change, Elsevier, vol. 175(C).
    4. Chi-Yo Huang & Liang-Chieh Wang & Ying-Ting Kuo & Wei-Ti Huang, 2021. "A Novel Analytic Framework of Technology Mining Using the Main Path Analysis and the Decision-Making Trial and Evaluation Laboratory-Based Analytic Network Process," Mathematics, MDPI, vol. 9(19), pages 1-24, October.
    5. Hou, Lei & Elsworth, Derek & Zhang, Fengshou & Wang, Zhiyuan & Zhang, Jianbo, 2023. "Evaluation of proppant injection based on a data-driven approach integrating numerical and ensemble learning models," Energy, Elsevier, vol. 264(C).
    6. Smith, Andrew N. & Fischer, Eileen & Yongjian, Chen, 2012. "How Does Brand-related User-generated Content Differ across YouTube, Facebook, and Twitter?," Journal of Interactive Marketing, Elsevier, vol. 26(2), pages 102-113.
    7. Monica Patrut, 2015. "Candidates In The Presidential Elections In Romania (2014): The Use Of Social Media In Political Marketing," Studies and Scientific Researches. Economics Edition, "Vasile Alecsandri" University of Bacau, Faculty of Economic Sciences, issue 21.
    8. Marcel Rosenberger & Christiane Lehrer & Reinhard Jung, 0. "Integrating data from user activities of social networks into public administrations," Information Systems Frontiers, Springer, vol. 0, pages 1-14.
    9. Ma, Zhikai & Huo, Qian & Wang, Wei & Zhang, Tao, 2023. "Voltage-temperature aware thermal runaway alarming framework for electric vehicles via deep learning with attention mechanism in time-frequency domain," Energy, Elsevier, vol. 278(C).
    10. Patrick Krennmair & Timo Schmid, 2022. "Flexible domain prediction using mixed effects random forests," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 71(5), pages 1865-1894, November.
    11. Vasile-Daniel Păvăloaia & Elena-Mădălina Teodor & Doina Fotache & Magdalena Danileţ, 2019. "Opinion Mining on Social Media Data: Sentiment Analysis of User Preferences," Sustainability, MDPI, vol. 11(16), pages 1-21, August.
    12. Jie Shi & Arno P. J. M. Siebes & Siamak Mehrkanoon, 2023. "TransCORALNet: A Two-Stream Transformer CORAL Networks for Supply Chain Credit Assessment Cold Start," Papers 2311.18749, arXiv.org.
    13. Sarbu Miruna, 2017. "Does Social Media Increase Labour Productivity?," Journal of Economics and Statistics (Jahrbuecher fuer Nationaloekonomie und Statistik), De Gruyter, vol. 237(2), pages 81-113, April.
    14. Shiwei Shen & Marios Sotiriadis & Qing Zhou, 2020. "Could Smart Tourists Be Sustainable and Responsible as Well? The Contribution of Social Networking Sites to Improving Their Sustainable and Responsible Behavior," Sustainability, MDPI, vol. 12(4), pages 1-21, February.
    15. Perez-Vega, Rodrigo & Hopkinson, Paul & Singhal, Aishwarya & Mariani, Marcello M., 2022. "From CRM to social CRM: A bibliometric review and research agenda for consumer research," Journal of Business Research, Elsevier, vol. 151(C), pages 1-16.
    16. Bourdouxhe, Axel & Wibail, Lionel & Claessens, Hugues & Dufrêne, Marc, 2023. "Modeling potential natural vegetation: A new light on an old concept to guide nature conservation in fragmented and degraded landscapes," Ecological Modelling, Elsevier, vol. 481(C).
    17. Murad Ali & Raja Ahmad Iskandar Bin Raja Yaacob & Mohd Nuri-Al-Amin B. Endut, 2017. "The Influence of Individual Characteristics towards the Use of Social Media as a Learning Tool: An Empirical Analysis," International Review of Management and Marketing, Econjournals, vol. 7(1), pages 251-256.
    18. Tafesse, Wondwesen & Wood, Bronwyn P., 2021. "Followers' engagement with instagram influencers: The role of influencers’ content and engagement strategy," Journal of Retailing and Consumer Services, Elsevier, vol. 58(C).
    19. Reilly, Anne H. & Hynan, Katherine A., 2014. "Corporate communication, sustainability, and social media: It's not easy (really) being green," Business Horizons, Elsevier, vol. 57(6), pages 747-758.
    20. Tuleu Daniela, 2015. "Antecedents Of Customer Relationship Management Capabilities," Annals of Faculty of Economics, University of Oradea, Faculty of Economics, vol. 1(1), pages 1285-1294, July.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jmathe:v:9:y:2021:i:17:p:2041-:d:621437. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.