IDEAS home Printed from https://ideas.repec.org/a/aag/wpaper/v25y2021i3p92-110.html
   My bibliography  Save this article

A Detailed Guide on How to Use Statistical Software R for Text Mining

Author

Listed:
  • Kim-Hung Pho

    (Fractional Calculus, Optimization and Algebra Research Group, Faculty of Mathematics and Statistics, Ton Duc Thang University, Ho Chi Minh City, Vietnam)

  • Ngoc-Hien Nguyen

    (Design Innovation Center (DBZ), Faculty of Engineering, Mondragon University, Spain)

  • Huu-Nhan Huynh

    (Department of Mathematics and Informatics, Vietnam Aviation Academy, Ho Chi Minh City, Vietnam)

  • Wing-Keung Wong

    (Department of Finance, Fintech Center, and Big Data Research Center, Asia University, Taiwan)

Abstract

Text mining is a very important issue in Statistics, Applied Mathematics, and many other areas in Sciences, Engineering, and Business because its applications are extremely rich and varied. Text mining can help academics and practitioners with some specific issues such as spam filtering, personal background matching, sentiment analysis, document classification, etc. The statistical software R is an exceedingly widely used software in Science because of its outstanding and completely free features. To contribute to the literature related to text mining, this study provides detailed instructions on how to use the statistical software R for text mining. To implement this goal, we first introduce the algorithm for text mining. We then discuss how to use the software R to approach each step of the algorithm in detail. As an application, the proposed algorithm is studied with an actual data set. The results found in this study will help academics and practitioners understand how to use the statistical software R to analyze text mining. This paper is very useful for both academics and practitioners in the study of text mining.

Suggested Citation

  • Kim-Hung Pho & Ngoc-Hien Nguyen & Huu-Nhan Huynh & Wing-Keung Wong, 2021. "A Detailed Guide on How to Use Statistical Software R for Text Mining," Advances in Decision Sciences, Asia University, Taiwan, vol. 25(3), pages 92-110, September.
  • Handle: RePEc:aag:wpaper:v:25:y:2021:i:3:p:92-110
    as

    Download full text from publisher

    File URL: https://iads.site/A-Detailed-Guide-on-How-to-Use-Statistical-Software-R-for-Text-Mining
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. CHIA-LIN CHANG & MICHAEL McALEER & ROENGCHAI TANSUCHAT, 2012. "Modelling Long Memory Volatility In Agricultural Commodity Futures Returns," Annals of Financial Economics (AFE), World Scientific Publishing Co. Pte. Ltd., vol. 7(02), pages 1-27.
    2. Moustafa Abuelfadl, 2017. "Individual Foreign Exchange Investors, Return Predictability And Market Timing," Annals of Financial Economics (AFE), World Scientific Publishing Co. Pte. Ltd., vol. 12(01), pages 1-28, March.
    3. Alexandros Gabrielsen & Axel Kirchner & Zhuoshi Liu & Paolo Zagaglia, 2015. "Forecasting Value-At-Risk With Time-Varying Variance, Skewness And Kurtosis In An Exponential Weighted Moving Average Framework," Annals of Financial Economics (AFE), World Scientific Publishing Co. Pte. Ltd., vol. 10(01), pages 1-29.
    4. Richard Lu & Chen-Chen Yang & Wing-Keung Wong, 2018. "Time Diversification: Perspectives From The Economic Index Of Riskiness," Annals of Financial Economics (AFE), World Scientific Publishing Co. Pte. Ltd., vol. 13(03), pages 1-15, September.
    5. Sigmund, Michael & Ferstl, Robert, 2021. "Panel vector autoregression in R with the package panelvar," The Quarterly Review of Economics and Finance, Elsevier, vol. 80(C), pages 693-720.
    6. Nguyen Huu Hau & Tran Trung Tinh & Hoa Anh Tuong & Wing-Keung Wong, 2020. "Review of Matrix Theory with Applications in Education and Decision Sciences," Advances in Decision Sciences, Asia University, Taiwan, vol. 24(1), pages 28-69, March.
    7. Ngo Tung Hieu & Lam Minh Huy & Huynh Manh Phat & Nguyen Ngoc Phuong Anh & Wing-Keung Wong, 2020. "Decision Sciences in Education: The STEMtech Model to Create Stem Products at High Schools in Vietnam," Advances in Decision Sciences, Asia University, Taiwan, vol. 24(2), pages 15-65, June.
    8. Buu-Chau Truong & Nguyen Van Thuan & Nguyen Huu Hau & Michael McAleer, 2019. "Applications of the Newton-Raphson Method in Decision Sciences and Education," Advances in Decision Sciences, Asia University, Taiwan, vol. 23(4), pages 52-80, December.
    9. Mu Niu & Joe Wandy & Rónán Daly & Simon Rogers & Dirk Husmeier, 2021. "R package for statistical inference in dynamical systems using kernel based gradient matching: KGode," Computational Statistics, Springer, vol. 36(1), pages 715-747, March.
    10. Michael McAleer, 2021. "A Critique of Recent Medical Research in JAMA on COVID-19," Advances in Decision Sciences, Asia University, Taiwan, vol. 25(1), pages 40-142, March.
    11. Tung Dang-Thanh Nguyen & Anh The Vo & Duc Hong Vo, 2019. "The Determinants Of Systematic Risk In Vietnam," Advances in Decision Sciences, Asia University, Taiwan, vol. 23(2), pages 15-36, June.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Massoud Moslehpour & Shin Hung Pan & Aviral Kumar Tiwari & Wing Keung Wong, 2021. "Editorial in Honour of Professor Michael McAleer," Advances in Decision Sciences, Asia University, Taiwan, vol. 25(4), pages 1-14, December.
    2. Le Ngoc Thuy Trang & Do Thi Thanh Nhan & Nguyen Thi Nhu Hao & Wing-Keung Wong, 2021. "Does Bank Liquidity Risk Lead To Bank'S Operational Efficiency? A Study In Vietnam," Advances in Decision Sciences, Asia University, Taiwan, vol. 25(4), pages 46-88, December.
    3. Chia-Lin Chang & Jukka Ilomäki & Hannu Laurila & Michael McAleer, 2018. "Long Run Returns Predictability and Volatility with Moving Averages," Risks, MDPI, vol. 6(4), pages 1-18, September.
    4. Demiralay, Sercan & Ulusoy, Veysel, 2014. "Non-linear volatility dynamics and risk management of precious metals," The North American Journal of Economics and Finance, Elsevier, vol. 30(C), pages 183-202.
    5. Vacca, Gianmarco & Zoia, Maria Grazia & Bagnato, Luca, 2022. "Forecasting in GARCH models with polynomially modified innovations," International Journal of Forecasting, Elsevier, vol. 38(1), pages 117-141.
    6. Thi Xuan Huong Tram & Nguyen Thi Thanh Hoai, 2021. "Effect of macroeconomic variables on systemic risk: Evidence from Vietnamese economy," Economics and Business Letters, Oviedo University Press, vol. 10(3), pages 217-228.
    7. Mustafa Demirel & Gazanfer Unal, 2020. "Applying multivariate-fractionally integrated volatility analysis on emerging market bond portfolios," Financial Innovation, Springer;Southwestern University of Finance and Economics, vol. 6(1), pages 1-29, December.
    8. Ma, Cong & Cheok, Mui Yee & Chok, Nyen Vui, 2023. "Economic recovery through multisector management resources in small and medium businesses in China," Resources Policy, Elsevier, vol. 80(C).
    9. Bharat Kumar Meher & Iqbal Thonse Hawaldar & Mathew Thomas Gil & Deebom Zorle Dum, 2021. "Measuring Leverage Effect of Covid 19 on Stock Price Volatility of Energy Companies Using High Frequency Data," International Journal of Energy Economics and Policy, Econjournals, vol. 11(6), pages 489-502.
    10. Mariusz Kapuściński, 2023. "Updated estimates of the role of the bank lending channel in monetary policy transmission in Poland," NBP Working Papers 359, Narodowy Bank Polski.
    11. Moawia Alghalith & Xu Guo & Wing-Keung Wong & Lixing Zhu, 2016. "A General Optimal Investment Model In The Presence Of Background Risk," Annals of Financial Economics (AFE), World Scientific Publishing Co. Pte. Ltd., vol. 11(01), pages 1-8, March.
    12. Radu Lupu, 2014. "Simultaneity of Tail Events for Dynamic Conditional Distributions of Stock Market Index Returns," Journal for Economic Forecasting, Institute for Economic Forecasting, vol. 0(4), pages 49-64, December.
    13. Pedro H Albuquerque & Prasad R Vemala, 2023. "Femicide Rates in Mexican Cities along the US-Mexico Border," Working Papers hal-04167930, HAL.
    14. Lucas, André & Zhang, Xin, 2016. "Score-driven exponentially weighted moving averages and Value-at-Risk forecasting," International Journal of Forecasting, Elsevier, vol. 32(2), pages 293-302.
    15. repec:ipg:wpaper:2013-009 is not listed on IDEAS
    16. Richard Lu & Chen-Chen Yang & Wing-Keung Wong, 2018. "Time Diversification: Perspectives From The Economic Index Of Riskiness," Annals of Financial Economics (AFE), World Scientific Publishing Co. Pte. Ltd., vol. 13(03), pages 1-15, September.
    17. Algieri, Bernardina, 2014. "The influence of biofuels, economic and financial factors on daily returns of commodity futures prices," Energy Policy, Elsevier, vol. 69(C), pages 227-247.
    18. Aloui, Chaker & Hammoudeh, Shawkat & Hamida, Hela ben, 2015. "Global factors driving structural changes in the co-movement between sharia stocks and sukuk in the Gulf Cooperation Council countries," The North American Journal of Economics and Finance, Elsevier, vol. 31(C), pages 311-329.
    19. Al-Shboul, Mohammad & Alsharari, Nizar, 2019. "The dynamic behavior of evolving efficiency: Evidence from the UAE stock markets," The Quarterly Review of Economics and Finance, Elsevier, vol. 73(C), pages 119-135.
    20. Tarek Chebbi & Abdelkader Derbali, 2015. "The dynamic correlation between energy commodities and Islamic stock market: analysis and forecasting," International Journal of Trade and Global Markets, Inderscience Enterprises Ltd, vol. 8(2), pages 112-126.
    21. Nguyen Huu Hau & Tran Trung Tinh & Hoa Anh Tuong & Wing-Keung Wong, 2020. "Review of Matrix Theory with Applications in Education and Decision Sciences," Advances in Decision Sciences, Asia University, Taiwan, vol. 24(1), pages 28-69, March.

    More about this item

    Keywords

    Guide; Text Mining; Statistics; software R;
    All these keywords.

    JEL classification:

    • J16 - Labor and Demographic Economics - - Demographic Economics - - - Economics of Gender; Non-labor Discrimination
    • K38 - Law and Economics - - Other Substantive Areas of Law - - - Human Rights Law; Gender Law; Animal Rights Law
    • M14 - Business Administration and Business Economics; Marketing; Accounting; Personnel Economics - - Business Administration - - - Corporate Culture; Diversity; Social Responsibility

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:aag:wpaper:v:25:y:2021:i:3:p:92-110. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Vincent Pan (email available below). General contact details of provider: https://edirc.repec.org/data/dfasitw.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.