IDEAS home Printed from https://ideas.repec.org/a/eee/tefoso/v158y2020ics0040162520310015.html
   My bibliography  Save this article

Tension in big data using machine learning: Analysis and applications

Author

Listed:
  • Wang, Huamao
  • Yao, Yumei
  • Salhi, Said

Abstract

The access of machine learning techniques in popular programming languages and the exponentially expanding big data from social media, news, surveys, and markets provide exciting challenges and invaluable opportunities for organizations and individuals to explore implicit information for decision making. Nevertheless, the users of machine learning usually find that these sophisticated techniques could incur a high level of tensions caused by the selection of the appropriate size of the training data set among other factors. In this paper, we provide a systematic way of resolving such tensions by examining practical examples of predicting popularity and sentiment of posts on Twitter and Facebook, blogs on Mashable, news on Google and Yahoo, the US house survey, and Bitcoin prices. Interesting results show that for the case of big data, using around 20% of the full sample often leads to a better prediction accuracy than opting for the full sample. Our conclusion is found to be consistent across a series of experiments. The managerial implication is that using more is not necessarily the best and users need to be cautious about such an important sensitivity as the simplistic approach may easily lead to inferior solutions with potentially detrimental consequences.

Suggested Citation

  • Wang, Huamao & Yao, Yumei & Salhi, Said, 2020. "Tension in big data using machine learning: Analysis and applications," Technological Forecasting and Social Change, Elsevier, vol. 158(C).
  • Handle: RePEc:eee:tefoso:v:158:y:2020:i:c:s0040162520310015
    DOI: 10.1016/j.techfore.2020.120175
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S0040162520310015
    Download Restriction: Full text for ScienceDirect subscribers only

    File URL: https://libkey.io/10.1016/j.techfore.2020.120175?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Gan, Lirong & Wang, Huamao & Yang, Zhaojun, 2020. "Machine learning solutions to challenges in finance: An application to the pricing of financial products," Technological Forecasting and Social Change, Elsevier, vol. 153(C).
    2. Hou, Ye & Gao, Ping & Nicholson, Brian, 2018. "Understanding organisational responses to regulative pressures in information security management: The case of a Chinese hospital," Technological Forecasting and Social Change, Elsevier, vol. 126(C), pages 64-75.
    3. Kayser, Victoria & Blind, Knut, 2017. "Extending the knowledge base of foresight: The contribution of text mining," Technological Forecasting and Social Change, Elsevier, vol. 116(C), pages 208-215.
    4. Hal R. Varian, 2014. "Big Data: New Tricks for Econometrics," Journal of Economic Perspectives, American Economic Association, vol. 28(2), pages 3-28, Spring.
    5. Iqbal, Rahat & Doctor, Faiyaz & More, Brian & Mahmud, Shahid & Yousuf, Usman, 2020. "Big data analytics: Computational intelligence techniques and application areas," Technological Forecasting and Social Change, Elsevier, vol. 153(C).
    6. Charles J. Corbett, 2018. "How Sustainable Is Big Data?," Production and Operations Management, Production and Operations Management Society, vol. 27(9), pages 1685-1695, September.
    7. Blazquez, Desamparados & Domenech, Josep, 2018. "Big Data sources and methods for social and economic analyses," Technological Forecasting and Social Change, Elsevier, vol. 130(C), pages 99-113.
    8. Sendhil Mullainathan & Jann Spiess, 2017. "Machine Learning: An Applied Econometric Approach," Journal of Economic Perspectives, American Economic Association, vol. 31(2), pages 87-106, Spring.
    9. Wang, Yichuan & Kung, LeeAnn & Byrd, Terry Anthony, 2018. "Big data analytics: Understanding its capabilities and potential benefits for healthcare organizations," Technological Forecasting and Social Change, Elsevier, vol. 126(C), pages 3-13.
    10. Jun, Seung-Pyo & Yoo, Hyoung Sun & Choi, San, 2018. "Ten years of research change using Google Trends: From the perspective of big data utilizations and applications," Technological Forecasting and Social Change, Elsevier, vol. 130(C), pages 69-87.
    11. Hippert, H.S. & Bunn, D.W. & Souza, R.C., 2005. "Large neural networks for electricity load forecasting: Are they overfitted?," International Journal of Forecasting, Elsevier, vol. 21(3), pages 425-434.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Meadows, Maureen & Merendino, Alessandro & Dibb, Sally & Garcia-Perez, Alexeis & Hinton, Matthew & Papagiannidis, Savvas & Pappas, Ilias & Wang, Huamao, 2022. "Tension in the data environment: How organisations can meet the challenge," Technological Forecasting and Social Change, Elsevier, vol. 175(C).
    2. Chaudhry, Sajid M. & Ahmed, Rizwan & Huynh, Toan Luu Duc & Benjasak, Chonlakan, 2022. "Tail risk and systemic risk of finance and technology (FinTech) firms," Technological Forecasting and Social Change, Elsevier, vol. 174(C).
    3. John-Mathews, Jean-Marie, 2022. "Some critical and ethical perspectives on the empirical turn of AI interpretability," Technological Forecasting and Social Change, Elsevier, vol. 174(C).
    4. Kazancoglu, Yigit & Sagnak, Muhittin & Mangla, Sachin Kumar & Sezer, Muruvvet Deniz & Pala, Melisa Ozbiltekin, 2021. "A fuzzy based hybrid decision framework to circularity in dairy supply chains through big data solutions," Technological Forecasting and Social Change, Elsevier, vol. 170(C).
    5. Ibrahim, Awad Elsayed Awad & Elamer, Ahmed A. & Ezat, Amr Nazieh, 2021. "The convergence of big data and accounting: innovative research opportunities," Technological Forecasting and Social Change, Elsevier, vol. 173(C).

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Brewis, Claire & Dibb, Sally & Meadows, Maureen, 2023. "Leveraging big data for strategic marketing: A dynamic capabilities model for incumbent firms," Technological Forecasting and Social Change, Elsevier, vol. 190(C).
    2. Yu, Baojun & Li, Changming & Mirza, Nawazish & Umar, Muhammad, 2022. "Forecasting credit ratings of decarbonized firms: Comparative assessment of machine learning models," Technological Forecasting and Social Change, Elsevier, vol. 174(C).
    3. Mariani, Marcello M. & Fosso Wamba, Samuel, 2020. "Exploring how consumer goods companies innovate in the digital age: The role of big data analytics companies," Journal of Business Research, Elsevier, vol. 121(C), pages 338-352.
    4. Sophie-Charlotte Klose & Johannes Lederer, 2020. "A Pipeline for Variable Selection and False Discovery Rate Control With an Application in Labor Economics," Papers 2006.12296, arXiv.org, revised Jun 2020.
    5. Akash Malhotra, 2018. "A hybrid econometric-machine learning approach for relative importance analysis: Prioritizing food policy," Papers 1806.04517, arXiv.org, revised Aug 2020.
    6. Arthur Charpentier & Emmanuel Flachaire & Antoine Ly, 2017. "Econom\'etrie et Machine Learning," Papers 1708.06992, arXiv.org, revised Mar 2018.
    7. Lidia Ceriani & Sergio Olivieri & Marco Ranzani, 2023. "Housing, imputed rent, and household welfare," The Journal of Economic Inequality, Springer;Society for the Study of Economic Inequality, vol. 21(1), pages 131-168, March.
    8. Croux, Christophe & Jagtiani, Julapa & Korivi, Tarunsai & Vulanovic, Milos, 2020. "Important factors determining Fintech loan default: Evidence from a lendingclub consumer platform," Journal of Economic Behavior & Organization, Elsevier, vol. 173(C), pages 270-296.
    9. Erik Heilmann & Janosch Henze & Heike Wetzel, 2021. "Machine learning in energy forecasts with an application to high frequency electricity consumption data," MAGKS Papers on Economics 202135, Philipps-Universität Marburg, Faculty of Business Administration and Economics, Department of Economics (Volkswirtschaftliche Abteilung).
    10. Jens Ludwig & Sendhil Mullainathan, 2021. "Fragile Algorithms and Fallible Decision-Makers: Lessons from the Justice System," Journal of Economic Perspectives, American Economic Association, vol. 35(4), pages 71-96, Fall.
    11. Halko, Marja-Liisa & Lappalainen, Olli & Sääksvuori, Lauri, 2021. "Do non-choice data reveal economic preferences? Evidence from biometric data and compensation-scheme choice," Journal of Economic Behavior & Organization, Elsevier, vol. 188(C), pages 87-104.
    12. Manuel J. García Rodríguez & Vicente Rodríguez Montequín & Francisco Ortega Fernández & Joaquín M. Villanueva Balsera, 2019. "Public Procurement Announcements in Spain: Regulations, Data Analysis, and Award Price Estimator Using Machine Learning," Complexity, Hindawi, vol. 2019, pages 1-20, November.
    13. Li, Lei & Lin, Jiabao & Ouyang, Ye & Luo, Xin (Robert), 2022. "Evaluating the impact of big data analytics usage on the decision-making quality of organizations," Technological Forecasting and Social Change, Elsevier, vol. 175(C).
    14. He, Xue-Zhong & Lin, Shen, 2022. "Reinforcement Learning Equilibrium in Limit Order Markets," Journal of Economic Dynamics and Control, Elsevier, vol. 144(C).
    15. Giovanni Di Franco & Michele Santurro, 2021. "Machine learning, artificial neural networks and social research," Quality & Quantity: International Journal of Methodology, Springer, vol. 55(3), pages 1007-1025, June.
    16. Galdo, Virgilio & Li, Yue & Rama, Martin, 2021. "Identifying urban areas by combining human judgment and machine learning: An application to India," Journal of Urban Economics, Elsevier, vol. 125(C).
    17. Piñeiro-Chousa, Juan & López-Cabarcos, M.Ángeles & Ribeiro-Soriano, Domingo, 2020. "Does investor attention influence water companies’ stock returns?," Technological Forecasting and Social Change, Elsevier, vol. 158(C).
    18. Arthur Blouin & Julian Dyer, 2021. "How Cultures Converge: An Empirical Investigation of Trade and Linguistic Exchange," Working Papers tecipa-691, University of Toronto, Department of Economics.
    19. Mona Aghdaee & Bonny Parkinson & Kompal Sinha & Yuanyuan Gu & Rajan Sharma & Emma Olin & Henry Cutler, 2022. "An examination of machine learning to map non‐preference based patient reported outcome measures to health state utility values," Health Economics, John Wiley & Sons, Ltd., vol. 31(8), pages 1525-1557, August.
    20. Gonzalo, Jesús & Pitarakis, Jean-Yves, 2021. "Spurious relationships in high-dimensional systems with strong or mild persistence," International Journal of Forecasting, Elsevier, vol. 37(4), pages 1480-1497.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:tefoso:v:158:y:2020:i:c:s0040162520310015. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.sciencedirect.com/science/journal/00401625 .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.