IDEAS home Printed from https://ideas.repec.org/p/arx/papers/2309.15552.html
   My bibliography  Save this paper

Startup success prediction and VC portfolio simulation using CrunchBase data

Author

Listed:
  • Mark Potanin
  • Andrey Chertok
  • Konstantin Zorin
  • Cyril Shtabtsovsky

Abstract

Predicting startup success presents a formidable challenge due to the inherently volatile landscape of the entrepreneurial ecosystem. The advent of extensive databases like Crunchbase jointly with available open data enables the application of machine learning and artificial intelligence for more accurate predictive analytics. This paper focuses on startups at their Series B and Series C investment stages, aiming to predict key success milestones such as achieving an Initial Public Offering (IPO), attaining unicorn status, or executing a successful Merger and Acquisition (M\&A). We introduce novel deep learning model for predicting startup success, integrating a variety of factors such as funding metrics, founder features, industry category. A distinctive feature of our research is the use of a comprehensive backtesting algorithm designed to simulate the venture capital investment process. This simulation allows for a robust evaluation of our model's performance against historical data, providing actionable insights into its practical utility in real-world investment contexts. Evaluating our model on Crunchbase's, we achieved a 14 times capital growth and successfully identified on B round high-potential startups including Revolut, DigitalOcean, Klarna, Github and others. Our empirical findings illuminate the importance of incorporating diverse feature sets in enhancing the model's predictive accuracy. In summary, our work demonstrates the considerable promise of deep learning models and alternative unstructured data in predicting startup success and sets the stage for future advancements in this research area.

Suggested Citation

  • Mark Potanin & Andrey Chertok & Konstantin Zorin & Cyril Shtabtsovsky, 2023. "Startup success prediction and VC portfolio simulation using CrunchBase data," Papers 2309.15552, arXiv.org.
  • Handle: RePEc:arx:papers:2309.15552
    as

    Download full text from publisher

    File URL: http://arxiv.org/pdf/2309.15552
    File Function: Latest version
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Tumasjan, Andranik & Braun, Reiner & Stolz, Barbara, 2021. "Twitter sentiment as a weak signal in venture capital financing," Journal of Business Venturing, Elsevier, vol. 36(2).
    2. Xin Wang & Kai Zong & Cuicui Luo, 2022. "Credit risk detection based on machine learning algorithms," International Journal of Financial Services Management, Inderscience Enterprises Ltd, vol. 11(3), pages 183-189.
    3. Antretter, Torben & Blohm, Ivo & Grichnik, Dietmar & Wincent, Joakim, 2019. "Predicting new venture survival: A Twitter-based machine learning approach to measuring online legitimacy," Journal of Business Venturing Insights, Elsevier, vol. 11(C), pages 1-1.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Seigner, Benedikt David Christian & Milanov, Hana & Lundmark, Erik & Shepherd, Dean A., 2023. "Tweeting like Elon? Provocative language, new-venture status, and audience engagement on social media," Journal of Business Venturing, Elsevier, vol. 38(2).
    2. Nartey Menzo, Benjamin Prince & Mogre, Diana & Asuamah Yeboah, Samuel, 2024. "Beyond Income: The Complexities of Credit Risk in Developing Countries," MPRA Paper 122364, University Library of Munich, Germany, revised 20 Sep 2024.
    3. Rogojan Luana Cristina & Croicu Andreea Elena & Iancu Laura Andreea, 2023. "Modern Approaches in Credit Risk Modeling: A Literature Review," Proceedings of the International Conference on Business Excellence, Sciendo, vol. 17(1), pages 1617-1627, July.
    4. Massimo G. Colombo & Benedetta Montanaro & Silvio Vismara, 2023. "What drives the valuation of entrepreneurial ventures? A map to navigate the literature and research directions," Small Business Economics, Springer, vol. 61(1), pages 59-84, June.
    5. Jörn H. Block & Walter Diegel & Christian Fisch, 2024. "How venture capital funding changes an entrepreneur’s digital identity: more self-confidence and professionalism but less authenticity!," Review of Managerial Science, Springer, vol. 18(8), pages 2287-2319, August.
    6. Tanja Verster & Erika Fourie, 2023. "The Changing Landscape of Financial Credit Risk Models," IJFS, MDPI, vol. 11(3), pages 1-15, August.
    7. Ricardo Costa-Climent & Samuel Ribeiro Navarrete & Darek M. Haftor & Marcin W. Staniewski, 2024. "Value creation and appropriation from the use of machine learning: a study of start-ups using fuzzy-set qualitative comparative analysis," International Entrepreneurship and Management Journal, Springer, vol. 20(2), pages 935-967, June.
    8. Brian Daniel Bernhardt & Chiara Marciano & Mario Rosario Guarracino, 2025. "The Impact of Alternative Data on Default Probability: Analyzing the Italian E-commerce Sector with NLP and Network Structures," SN Operations Research Forum, Springer, vol. 6(2), pages 1-30, June.
    9. Schade, Philipp & Schuhmacher, Monika C., 2023. "Predicting entrepreneurial activity using machine learning," Journal of Business Venturing Insights, Elsevier, vol. 19(C).
    10. Yngve Dahle & Kevin Reuther & Martin Steinert & Magne Supphellen, 2023. "Towards a systemic entrepreneurship activity model," International Entrepreneurship and Management Journal, Springer, vol. 19(4), pages 1583-1610, December.
    11. Sahab Zandi & Kamesh Korangi & Mar'ia 'Oskarsd'ottir & Christophe Mues & Cristi'an Bravo, 2024. "Attention-based Dynamic Multilayer Graph Neural Networks for Loan Default Prediction," Papers 2402.00299, arXiv.org, revised Jun 2024.
    12. Li, Yisheng & Zadehnoori, Iman & Jowhar, Ahmad & Wise, Sean & Laplume, Andre & Zihayat, Morteza, 2024. "Learning from Yesterday: Predicting early-stage startup success for accelerators through content and cohort dynamics," Journal of Business Venturing Insights, Elsevier, vol. 22(C).
    13. Manuel Kaiser & Andreas Kuckertz, 2025. "Emotions and entrepreneurial finance: Analysis of venture capitalists’ and business angels’ digital footprints on Twitter," International Entrepreneurship and Management Journal, Springer, vol. 21(1), pages 1-29, December.
    14. Jörn H. Block & Christian Fisch & Walter Diegel, 2024. "Schumpeterian entrepreneurial digital identity and funding from venture capital firms," The Journal of Technology Transfer, Springer, vol. 49(1), pages 119-157, February.
    15. Ruey‐Jer “Bryan” Jean & Daekwan Kim, 2021. "Signalling Strategies of Exporters on Internet Business‐to‐Business Platforms," Journal of Management Studies, Wiley Blackwell, vol. 58(7), pages 1869-1898, November.
    16. Andranik Tumasjan, 2024. "The many faces of social media in business and economics research: Taking stock of the literature and looking into the future," Journal of Economic Surveys, Wiley Blackwell, vol. 38(2), pages 389-426, April.
    17. Ruling Zhang & Zengrui Tian & Killian J. McCarthy & Xiao Wang & Kun Zhang, 2023. "Application of machine learning techniques to predict entrepreneurial firm valuation," Journal of Forecasting, John Wiley & Sons, Ltd., vol. 42(2), pages 402-417, March.
    18. Ungerer, Christina & Reuther, Kevin & Baltes, Guido, 2021. "The lingering living dead phenomenon: Distorting venture survival studies?," Journal of Business Venturing Insights, Elsevier, vol. 16(C).
    19. Malyy, Maksim & Tekic, Zeljko & Podladchikova, Tatiana, 2021. "The value of big data for analyzing growth dynamics of technology-based new ventures," Technological Forecasting and Social Change, Elsevier, vol. 169(C).
    20. Brygała Magdalena & Korol Tomasz, 2024. "Personal bankruptcy prediction using machine learning techniques," Economics and Business Review, Sciendo, vol. 10(2), pages 118-142.

    More about this item

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:arx:papers:2309.15552. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: arXiv administrators (email available below). General contact details of provider: http://arxiv.org/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.