IDEAS home Printed from https://ideas.repec.org/p/tin/wpaper/20190066.html
   My bibliography  Save this paper

Data Science in Strategy: Machine learning and text analysis in the study of firm growth

Author

Listed:
  • Daan Kolkman

    (Technical University Eindhoven)

  • Arjen van Witteloostuijn

    (Vrije Universiteit Amsterdam)

Abstract

This study examines the applicability of modern Data Science techniques in the domain of Strategy. We apply novel techniques from the field of machine learning and text analysis. WE proceed in two steps. First, we compare different machine learning techniques to traditional regression methods in terms of their goodness-of-fit, using a dataset with 168,055 firms, only including basic demographic and financial information. The novel methods fare to three to four times better, with the random forest technique achieving the best goodness-of-fit. Second, based on 8,163 informative websites of Dutch SMEs, we construct four additional proxies for personality and strategy variables. Including our four text-analyzed variables adds about 2.5 per cent to the R2. Together, our pair of contributions provide evidence for the large potential of applying modern Data Science techniques in Strategy research. We reflect on the potential contribution of modern Data Science techniques from the perspective of the common critique that machine learning offers increased predictive accuracy at the expense of explanatory insight. Particularly, we will argue and illustrate why and how machine learning can be a productive element in the abductive theory-building cycle.

Suggested Citation

  • Daan Kolkman & Arjen van Witteloostuijn, 2019. "Data Science in Strategy: Machine learning and text analysis in the study of firm growth," Tinbergen Institute Discussion Papers 19-066/VI, Tinbergen Institute.
  • Handle: RePEc:tin:wpaper:20190066
    as

    Download full text from publisher

    File URL: https://papers.tinbergen.nl/19066.pdf
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Wijbenga, Frits H. & van Witteloostuijn, Arjen, 2007. "Entrepreneurial locus of control and competitive strategies - The moderating effect of environmental dynamism," Journal of Economic Psychology, Elsevier, vol. 28(5), pages 566-589, October.
    2. Elizabeth Garnsey & Erik Stam & Paul Heffernan, 2006. "New Firm Growth: Exploring Processes and Paths," Industry and Innovation, Taylor & Francis Journals, vol. 13(1), pages 1-20.
    3. Friedman, Jerome H., 2002. "Stochastic gradient boosting," Computational Statistics & Data Analysis, Elsevier, vol. 38(4), pages 367-378, February.
    4. Andrew Gelman & Guido Imbens, 2013. "Why ask Why? Forward Causal Inference and Reverse Causal Questions," NBER Working Papers 19614, National Bureau of Economic Research, Inc.
    5. Danny Miller & Jean-Marie Toulouse, 1986. "Chief Executive Personality and Corporate Strategy and Structure in Small Firms," Management Science, INFORMS, vol. 32(11), pages 1389-1409, November.
    6. Christophe Boone & Bert De Brabander & Arjen Van Witteloostuijn, 1996. "Ceo Locus of Control and Small Firm Performance: an Integrative Framework and Empirical Test," Journal of Management Studies, Wiley Blackwell, vol. 33(5), pages 667-700, September.
    7. Hui Zou & Trevor Hastie, 2005. "Addendum: Regularization and variable selection via the elastic net," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 67(5), pages 768-768, November.
    8. Hui Zou & Trevor Hastie, 2005. "Regularization and variable selection via the elastic net," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 67(2), pages 301-320, April.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Markku Maula & Wouter Stam, 2020. "Enhancing Rigor in Quantitative Entrepreneurship Research," Entrepreneurship Theory and Practice, , vol. 44(6), pages 1059-1090, November.
    2. Falco J. Bargagli-Stoffi & Jan Niederreiter & Massimo Riccaboni, 2020. "Supervised learning for the prediction of firm dynamics," Papers 2009.06413, arXiv.org.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Zhu, Haibin & Bai, Lu & He, Lidan & Liu, Zhi, 2023. "Forecasting realized volatility with machine learning: Panel data perspective," Journal of Empirical Finance, Elsevier, vol. 73(C), pages 251-271.
    2. Barzin,Samira & Avner,Paolo & Maruyama Rentschler,Jun Erik & O’Clery,Neave, 2022. "Where Are All the Jobs ? A Machine Learning Approach for High Resolution Urban Employment Prediction inDeveloping Countries," Policy Research Working Paper Series 9979, The World Bank.
    3. Doruk Cengiz & Arindrajit Dube & Attila S. Lindner & David Zentler-Munro, 2021. "Seeing Beyond the Trees: Using Machine Learning to Estimate the Impact of Minimum Wages on Labor Market Outcomes," NBER Working Papers 28399, National Bureau of Economic Research, Inc.
    4. Vrontos, Spyridon D. & Galakis, John & Vrontos, Ioannis D., 2021. "Modeling and predicting U.S. recessions using machine learning techniques," International Journal of Forecasting, Elsevier, vol. 37(2), pages 647-671.
    5. Yagli, Gokhan Mert & Yang, Dazhi & Srinivasan, Dipti, 2019. "Automatic hourly solar forecasting using machine learning models," Renewable and Sustainable Energy Reviews, Elsevier, vol. 105(C), pages 487-498.
    6. Thitiphat Klinsuwan & Wachiraphong Ratiphaphongthon & Rabian Wangkeeree & Rattanaporn Wangkeeree & Chatchai Sirisamphanwong, 2023. "Evaluation of Machine Learning Algorithms for Supervised Anomaly Detection and Comparison between Static and Dynamic Thresholds in Photovoltaic Systems," Energies, MDPI, vol. 16(4), pages 1-22, February.
    7. Maria-Carmen García-Centeno & Román Mínguez-Salido & Raúl del Pozo-Rubio, 2021. "The Classification of Profiles of Financial Catastrophe Caused by Out-of-Pocket Payments: A Methodological Approach," Mathematics, MDPI, vol. 9(11), pages 1-20, May.
    8. Athey, Susan & Imbens, Guido W., 2019. "Machine Learning Methods Economists Should Know About," Research Papers 3776, Stanford University, Graduate School of Business.
    9. Garvit Arora & Shubhangi Tiwari & Ying Wu & Xuan Mei, 2024. "An Exploration to the Correlation Structure and Clustering of Macroeconomic Variables," Papers 2401.10162, arXiv.org, revised May 2024.
    10. Konrad Bogner & Florian Pappenberger & Massimiliano Zappa, 2019. "Machine Learning Techniques for Predicting the Energy Consumption/Production and Its Uncertainties Driven by Meteorological Observations and Forecasts," Sustainability, MDPI, vol. 11(12), pages 1-22, June.
    11. Huei-Wen Teng & Yu-Hsien Li, 2023. "Can deep neural networks outperform Fama-MacBeth regression and other supervised learning approaches in stock returns prediction with asset-pricing factors?," Digital Finance, Springer, vol. 5(1), pages 149-182, March.
    12. Gür Ali, Özden & Gürlek, Ragıp, 2020. "Automatic Interpretable Retail forecasting with promotional scenarios," International Journal of Forecasting, Elsevier, vol. 36(4), pages 1389-1406.
    13. Keddouda, Abdelhak & Ihaddadene, Razika & Boukhari, Ali & Atia, Abdelmalek & Arıcı, Müslüm & Lebbihiat, Nacer & Ihaddadene, Nabila, 2024. "Photovoltaic module temperature prediction using various machine learning algorithms: Performance evaluation," Applied Energy, Elsevier, vol. 363(C).
    14. Tutz, Gerhard & Pößnecker, Wolfgang & Uhlmann, Lorenz, 2015. "Variable selection in general multinomial logit models," Computational Statistics & Data Analysis, Elsevier, vol. 82(C), pages 207-222.
    15. Oxana Babecka Kucharcukova & Jan Bruha, 2016. "Nowcasting the Czech Trade Balance," Working Papers 2016/11, Czech National Bank.
    16. Carstensen, Kai & Heinrich, Markus & Reif, Magnus & Wolters, Maik H., 2020. "Predicting ordinary and severe recessions with a three-state Markov-switching dynamic factor model," International Journal of Forecasting, Elsevier, vol. 36(3), pages 829-850.
    17. Hou-Tai Chang & Ping-Huai Wang & Wei-Fang Chen & Chen-Ju Lin, 2022. "Risk Assessment of Early Lung Cancer with LDCT and Health Examinations," IJERPH, MDPI, vol. 19(8), pages 1-12, April.
    18. Margherita Giuzio, 2017. "Genetic algorithm versus classical methods in sparse index tracking," Decisions in Economics and Finance, Springer;Associazione per la Matematica, vol. 40(1), pages 243-256, November.
    19. Nicolaj N. Mühlbach, 2020. "Tree-based Synthetic Control Methods: Consequences of moving the US Embassy," CREATES Research Papers 2020-04, Department of Economics and Business Economics, Aarhus University.
    20. Wang, Qiao & Zhou, Wei & Cheng, Yonggang & Ma, Gang & Chang, Xiaolin & Miao, Yu & Chen, E, 2018. "Regularized moving least-square method and regularized improved interpolating moving least-square method with nonsingular moment matrices," Applied Mathematics and Computation, Elsevier, vol. 325(C), pages 120-145.

    More about this item

    JEL classification:

    • L1 - Industrial Organization - - Market Structure, Firm Strategy, and Market Performance

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:tin:wpaper:20190066. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Tinbergen Office +31 (0)10-4088900 (email available below). General contact details of provider: https://edirc.repec.org/data/tinbenl.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.