IDEAS home Printed from https://ideas.repec.org/p/zbw/irtgdp/2019022.html
   My bibliography  Save this paper

A Machine Learning Approach Towards Startup Success Prediction

Author

Listed:
  • Ünal, Cemre
  • Ceasu, Ioana

Abstract

The importance of startups for a dynamic, innovative and competitive economy has already been acknowledged in the scientific and business literature. The highly uncertain and volatile nature of the startup ecosystem makes the evaluation of startup success through analysis and interpretation of information very time consuming and computationally intensive. This prediction problem brings forward the need for a quantitative model, which should enable an objective and fact- based approach to startup success prediction. This paper presents a series of reproducible models for startup success prediction, using machine learning methods. The data used for this purpose was received from the online investor platform, crunchbase.com. The data has been pre-processed for sampling bias and imbalance by using the oversampling approach, ADASYN. A total of six different models are implemented to predict startup success. Using goodness-of-fit measures, applicable to each model case, the best models selected are the ensemble methods, random forest and extreme gradient boosting with a test set prediction accuracy of 94.1% and 94.5% and AUC of 92.22% and 92.91% respectively. Top variables in these models are last funding to date, first funding lag and company age. The models presented in this study can be used to predict success rate for future new firms/ventures in a repeatable way.

Suggested Citation

  • Ünal, Cemre & Ceasu, Ioana, 2019. "A Machine Learning Approach Towards Startup Success Prediction," IRTG 1792 Discussion Papers 2019-022, Humboldt University of Berlin, International Research Training Group 1792 "High Dimensional Nonstationary Time Series".
  • Handle: RePEc:zbw:irtgdp:2019022
    as

    Download full text from publisher

    File URL: https://www.econstor.eu/bitstream/10419/230798/1/irtg1792dp2019-022.pdf
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Read, Daniel & van Leeuwen, Barbara, 1998. "Predicting Hunger: The Effects of Appetite and Delay on Choice, , , ," Organizational Behavior and Human Decision Processes, Elsevier, vol. 76(2), pages 189-205, November.
    2. Salih Zeki Ozdemir & Peter Moran & Xing Zhong & Martin J. Bliemel, 2016. "Reaching and Acquiring Valuable Resources: The Entrepreneur's Use of Brokerage, Cohesion, and Embeddedness," Entrepreneurship Theory and Practice, , vol. 40(1), pages 49-79, January.
    3. Herbert A. Simon, 1955. "A Behavioral Model of Rational Choice," The Quarterly Journal of Economics, President and Fellows of Harvard College, vol. 69(1), pages 99-118.
    4. du Jardin, Philippe, 2016. "A two-stage classification technique for bankruptcy prediction," European Journal of Operational Research, Elsevier, vol. 254(1), pages 236-252.
    5. Jiaming Liu & Chong Wu, 2019. "Hybridizing kernel‐based fuzzy c‐means with hierarchical selective neural network ensemble model for business failure prediction," Journal of Forecasting, John Wiley & Sons, Ltd., vol. 38(2), pages 92-105, March.
    6. Rajagopal, 2014. "Organizations and Innovation," Palgrave Macmillan Books, in: Architecting Enterprise, chapter 3, pages 58-86, Palgrave Macmillan.
    7. Michael Luger & Jun Koo, 2005. "Defining and Tracking Business Start-Ups," Small Business Economics, Springer, vol. 24(1), pages 17-28, January.
    8. Friedman, Jerome H., 2002. "Stochastic gradient boosting," Computational Statistics & Data Analysis, Elsevier, vol. 38(4), pages 367-378, February.
    9. Scott Shane, 2012. "The Importance of Angel Investing in Financing the Growth of Entrepreneurial Ventures," Quarterly Journal of Finance (QJF), World Scientific Publishing Co. Pte. Ltd., vol. 2(02), pages 1-42.
    10. Jones, Paul M. & Olson, Eric, 2013. "The time-varying correlation between uncertainty, output, and inflation: Evidence from a DCC-GARCH model," Economics Letters, Elsevier, vol. 118(1), pages 33-37.
    11. Dimitras, A. I. & Zanakis, S. H. & Zopounidis, C., 1996. "A survey of business failures with an emphasis on prediction methods and industrial applications," European Journal of Operational Research, Elsevier, vol. 90(3), pages 487-513, May.
    12. Ricardo Cao & Miguel Delgado & Wenceslao González-Manteiga, 1997. "Nonparametric curve estimation: An overview," Investigaciones Economicas, Fundación SEPI, vol. 21(2), pages 209-252, May.
    13. Laitinen, Erkki K., 1992. "Prediction of failure of a newly founded firm," Journal of Business Venturing, Elsevier, vol. 7(4), pages 323-340, July.
    14. Adrian Gepp & Kuldeep Kumar & Sukanto Bhattacharya, 2010. "Business failure prediction using decision trees," Journal of Forecasting, John Wiley & Sons, Ltd., vol. 29(6), pages 536-555.
    15. Chang, Sea Jin, 2004. "Venture capital financing, strategic alliances, and the initial public offerings of Internet startups," Journal of Business Venturing, Elsevier, vol. 19(5), pages 721-741, September.
    16. Ernesto Tavoletti, 2013. "Business Incubators: Effective Infrastructures or Waste of Public Money? Looking for a Theoretical Framework, Guidelines and Criteria," Journal of the Knowledge Economy, Springer;Portland International Center for Management of Engineering and Technology (PICMET), vol. 4(4), pages 423-443, December.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Lele Cao & Vilhelm von Ehrenheim & Sebastian Krakowski & Xiaoxue Li & Alexandra Lutz, 2022. "Using Deep Learning to Find the Next Unicorn: A Practical Synthesis," Papers 2210.14195, arXiv.org.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Stefano DellaVigna, 2009. "Psychology and Economics: Evidence from the Field," Journal of Economic Literature, American Economic Association, vol. 47(2), pages 315-372, June.
    2. Balcaen, Sofie & Ooghe, Hubert, 2006. "35 years of studies on business failure: an overview of the classic statistical methodologies and their related problems," The British Accounting Review, Elsevier, vol. 38(1), pages 63-93.
    3. Doumpos, Michalis & Andriosopoulos, Kostas & Galariotis, Emilios & Makridou, Georgia & Zopounidis, Constantin, 2017. "Corporate failure prediction in the European energy sector: A multicriteria approach and the effect of country characteristics," European Journal of Operational Research, Elsevier, vol. 262(1), pages 347-360.
    4. Katharina Dowling & Daniel Guhl & Daniel Klapper & Martin Spann & Lucas Stich & Narine Yegoryan, 2020. "Behavioral biases in marketing," Journal of the Academy of Marketing Science, Springer, vol. 48(3), pages 449-477, May.
    5. Rhodes, Charles, 2012. "A Dynamic Model of Failure to Maximize Utility in the Chronic Consumer Choice to Consume Foods High in Added Sugars," 2012 Annual Meeting, August 12-14, 2012, Seattle, Washington 124693, Agricultural and Applied Economics Association.
    6. Carmona, Pedro & Dwekat, Aladdin & Mardawi, Zeena, 2022. "No more black boxes! Explaining the predictions of a machine learning XGBoost classifier algorithm in business failure," Research in International Business and Finance, Elsevier, vol. 61(C).
    7. Soo Young Kim, 2018. "Predicting hospitality financial distress with ensemble models: the case of US hotels, restaurants, and amusement and recreation," Service Business, Springer;Pan-Pacific Business Association, vol. 12(3), pages 483-503, September.
    8. du Jardin, Philippe, 2021. "Forecasting corporate failure using ensemble of self-organizing neural networks," European Journal of Operational Research, Elsevier, vol. 288(3), pages 869-885.
    9. Dohyeon Kim & Su Yong Lee, 2022. "When venture capitalists are attracted by the experienced," Journal of Innovation and Entrepreneurship, Springer, vol. 11(1), pages 1-18, December.
    10. Newark, Daniel A., 2014. "Indecision and the construction of self," Organizational Behavior and Human Decision Processes, Elsevier, vol. 125(2), pages 162-174.
    11. De Bock, Koen W. & Coussement, Kristof & Lessmann, Stefan, 2020. "Cost-sensitive business failure prediction when misclassification costs are uncertain: A heterogeneous ensemble selection approach," European Journal of Operational Research, Elsevier, vol. 285(2), pages 612-630.
    12. Pierdzioch Christian & Gupta Rangan, 2020. "Uncertainty and Forecasts of U.S. Recessions," Studies in Nonlinear Dynamics & Econometrics, De Gruyter, vol. 24(4), pages 1-20, September.
    13. Koen W. de Bock & Kristof Coussement & Stefan Lessmann, 2020. "Cost-sensitive business failure prediction when misclassification costs are uncertain: A heterogeneous ensemble selection approach," Post-Print hal-02863245, HAL.
    14. Kaiser, Ulrich & Kuhn, Johan M., 2020. "The value of publicly available, textual and non-textual information for startup performance prediction," Journal of Business Venturing Insights, Elsevier, vol. 14(C).
    15. Philippe Jardin & David Veganzones & Eric Séverin, 2019. "Forecasting Corporate Bankruptcy Using Accrual-Based Models," Computational Economics, Springer;Society for Computational Economics, vol. 54(1), pages 7-43, June.
    16. Kim, Jongwoo & Kim, Hongil & Geum, Youngjung, 2023. "How to succeed in the market? Predicting startup success using a machine learning approach," Technological Forecasting and Social Change, Elsevier, vol. 193(C).
    17. Zeineb Affes & Rania Hentati-Kaffel, 2019. "Predicting US Banks Bankruptcy: Logit Versus Canonical Discriminant Analysis," Computational Economics, Springer;Society for Computational Economics, vol. 54(1), pages 199-244, June.
    18. S. Balcaen & J. Buyze & H. Ooghe, 2009. "Financial distress and firm exit: determinants of involuntary exits, voluntary liquidations and restructuring exits," Working Papers of Faculty of Economics and Business Administration, Ghent University, Belgium 09/598, Ghent University, Faculty of Economics and Business Administration.
    19. Antretter, Torben & Sirén, Charlotta & Grichnik, Dietmar & Wincent, Joakim, 2020. "Should business angels diversify their investment portfolios to achieve higher performance? The role of knowledge access through co-investment networks," Journal of Business Venturing, Elsevier, vol. 35(5).
    20. Jabeur, Sami Ben & Gharib, Cheima & Mefteh-Wali, Salma & Arfi, Wissal Ben, 2021. "CatBoost model and artificial intelligence techniques for corporate failure prediction," Technological Forecasting and Social Change, Elsevier, vol. 166(C).

    More about this item

    Keywords

    Machine learning;

    JEL classification:

    • C00 - Mathematical and Quantitative Methods - - General - - - General

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:zbw:irtgdp:2019022. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: ZBW - Leibniz Information Centre for Economics (email available below). General contact details of provider: https://edirc.repec.org/data/wfhubde.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.