IDEAS home Printed from https://ideas.repec.org/a/eee/ememar/v68y2025ics1566014125000925.html

IPOhelper: Mining features in registration statements for listing prediction of technological innovation companies

Author

Listed:
  • Wei, Mingye
  • Zhang, Min
  • Wei, Lu
  • Chen, Meiqi

Abstract

This paper develops IPOhelper based on statistical (financial, technological innovation indicators) and semantic cues (textual indicators) in registration statements, which is a novel predictive system for initial public offering (IPO) prediction. Based on 692 registration statements of technological innovation companies from 2019 to 2023, we found that the IPOhelper performs exceptionally well in predicting IPO outcomes. Compared with statistical cues, the predictive abilities of semantic features are particularly prominent. In particular, the semantic feature of “Technovation”, which reflects the adequacy of innovation-related information disclosure, is the most important feature for IPO prediction.

Suggested Citation

  • Wei, Mingye & Zhang, Min & Wei, Lu & Chen, Meiqi, 2025. "IPOhelper: Mining features in registration statements for listing prediction of technological innovation companies," Emerging Markets Review, Elsevier, vol. 68(C).
  • Handle: RePEc:eee:ememar:v:68:y:2025:i:c:s1566014125000925
    DOI: 10.1016/j.ememar.2025.101343
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S1566014125000925
    Download Restriction: Full text for ScienceDirect subscribers only

    File URL: https://libkey.io/10.1016/j.ememar.2025.101343?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to

    for a different version of it.

    References listed on IDEAS

    as
    1. David F. Larcker & Anastasia A. Zakolyukina, 2012. "Detecting Deceptive Discussions in Conference Calls," Journal of Accounting Research, John Wiley & Sons, Ltd., vol. 50(2), pages 495-540, May.
    2. Tang, Siyuan & Luo, Runmei, 2024. "Price deregulation and investors’ IPO speculation: Evidence from Chinese registration system reform," Research in International Business and Finance, Elsevier, vol. 71(C).
    3. Yang Bao & Bin Ke & Bin Li & Y. Julia Yu & Jie Zhang, 2020. "Detecting Accounting Fraud in Publicly Traded U.S. Firms Using a Machine Learning Approach," Journal of Accounting Research, John Wiley & Sons, Ltd., vol. 58(1), pages 199-235, March.
    4. Haemin Dennis Park & Pankaj C. Patel, 2015. "How Does Ambiguity Influence IPO Underpricing? The Role of the Signalling Environment," Journal of Management Studies, Wiley Blackwell, vol. 52(6), pages 796-818, September.
    5. Wu, Xihao & Shen, Yuezhe & Sun, Yani, 2024. "Can the registration system reform improve the disclosure quality?——Evidence from the ChiNext board," Journal of Contemporary Accounting and Economics, Elsevier, vol. 20(2).
    6. Tim Loughran & Bill Mcdonald, 2016. "Textual Analysis in Accounting and Finance: A Survey," Journal of Accounting Research, John Wiley & Sons, Ltd., vol. 54(4), pages 1187-1230, September.
    7. Shihao Gu & Bryan Kelly & Dacheng Xiu, 2020. "Empirical Asset Pricing via Machine Learning," Review of Finance, European Finance Association, vol. 33(5), pages 2223-2273.
    8. Leippold, Markus & Wang, Qian & Zhou, Wenyu, 2022. "Machine learning in the Chinese stock market," Journal of Financial Economics, Elsevier, vol. 145(2), pages 64-82.
    9. Shao-Bo Lin & Shaojie Tang & Yao Wang & Di Wang, 2022. "Toward Efficient Ensemble Learning with Structure Constraints: Convergent Algorithms and Applications," INFORMS Journal on Computing, INFORMS, vol. 34(6), pages 3096-3116, November.
    10. Hongke Zhao & Chuang Zhao & Xi Zhang & Nanlin Liu & Hengshu Zhu & Qi Liu & Hui Xiong, 2023. "An Ensemble Learning Approach with Gradient Resampling for Class-Imbalance Problems," INFORMS Journal on Computing, INFORMS, vol. 35(4), pages 747-763, July.
    11. Lei Wang & Ram Gopal & Ramesh Shankar & Joseph Pancras, 2022. "Forecasting venue popularity on location‐based services using interpretable machine learning," Production and Operations Management, Production and Operations Management Society, vol. 31(7), pages 2773-2788, July.
    12. Zhang, Zejun & Wang, Zhao & Cai, Lixin, 2025. "Predicting financial fraud in Chinese listed companies: An enterprise portrait and machine learning approach," Pacific-Basin Finance Journal, Elsevier, vol. 90(C).
    13. Kocaarslan, Baris & Soytas, Ugur, 2023. "The role of major markets in predicting the U.S. municipal green bond market performance: New evidence from machine learning models," Technological Forecasting and Social Change, Elsevier, vol. 196(C).
    14. Elizabeth Blankespoor, 2019. "The Impact of Information Processing Costs on Firm Disclosure Choice: Evidence from the XBRL Mandate," Journal of Accounting Research, John Wiley & Sons, Ltd., vol. 57(4), pages 919-967, September.
    15. Albagli, Elias & Ceballos, Luis & Claro, Sebastian & Romero, Damian, 2019. "Channels of US monetary policy spillovers to international bond markets," Journal of Financial Economics, Elsevier, vol. 134(2), pages 447-473.
    16. Julian Senoner & Torbjørn Netland & Stefan Feuerriegel, 2022. "Using Explainable Artificial Intelligence to Improve Process Quality: Evidence from Semiconductor Manufacturing," Management Science, INFORMS, vol. 68(8), pages 5704-5723, August.
    17. Huosong Xia & Juan Weng & Sabri Boubaker & Zuopeng Zhang & Sajjad M. Jasimuddin, 2024. "Cross-influence of information and risk effects on the IPO market: exploring risk disclosure with a machine learning approach," Annals of Operations Research, Springer, vol. 334(1), pages 761-797, March.
    18. Ole-Kristian Hope & Danqi Hu & Hai Lu, 2016. "The benefits of specific risk-factor disclosures," Review of Accounting Studies, Springer, vol. 21(4), pages 1005-1045, December.
    19. Kai Li & Feng Mai & Rui Shen & Xinyan Yan, 2021. "Measuring Corporate Culture Using Machine Learning," NBER Chapters, in: Big Data: Long-Term Implications for Financial Markets and Firms, pages 3265-3315, National Bureau of Economic Research, Inc.
    20. Lu Wei & Xiyuan Miao & Haozhe Jing & Guowen Li, 2025. "Discovering High-Risk Bank Risk Factors Based on Risk Matrix," International Journal of Information Technology & Decision Making (IJITDM), World Scientific Publishing Co. Pte. Ltd., vol. 24(03), pages 743-764, April.
    21. Gustaf Bellstam & Sanjai Bhagat & J. Anthony Cookson, 2021. "A Text-Based Analysis of Corporate Innovation," Management Science, INFORMS, vol. 67(7), pages 4004-4031, July.
    22. Jiang, Cuiqing & Zhou, Yiru & Chen, Bo, 2023. "Mining semantic features in patent text for financial distress prediction," Technological Forecasting and Social Change, Elsevier, vol. 190(C).
    23. Yi Yang & Kunpeng Zhang & Yangyang Fan, 2022. "Analyzing Firm Reports for Volatility Prediction: A Knowledge-Driven Text-Embedding Approach," INFORMS Journal on Computing, INFORMS, vol. 34(1), pages 522-540, January.
    24. Gondia, Ahmed & Moussa, Ahmed & Ezzeldin, Mohamed & El-Dakhakhni, Wael, 2023. "Machine learning-based construction site dynamic risk models," Technological Forecasting and Social Change, Elsevier, vol. 189(C).
    25. Kai Li & Feng Mai & Rui Shen & Xinyan Yan, 2021. "Measuring Corporate Culture Using Machine Learning [Machine learning methods that economists should know about]," The Review of Financial Studies, Society for Financial Studies, vol. 34(7), pages 3265-3315.
    26. Obaid, Khaled & Pukthuanthong, Kuntara, 2022. "A picture is worth a thousand words: Measuring investor sentiment by combining machine learning and photos from news," Journal of Financial Economics, Elsevier, vol. 144(1), pages 273-297.
    27. Dyer, Travis & Lang, Mark & Stice-Lawrence, Lorien, 2017. "The evolution of 10-K textual disclosure: Evidence from Latent Dirichlet Allocation," Journal of Accounting and Economics, Elsevier, vol. 64(2), pages 221-245.
    28. Chen, Yangfa & Jiang, Ji & Liu, Jie & Liu, Xiao & Wu, Weili, 2025. "Registration system reform, information environment, and market manipulation," Journal of Corporate Finance, Elsevier, vol. 93(C).
    29. Lang, Mark & Stice-Lawrence, Lorien, 2015. "Textual analysis and international financial reporting: Large sample evidence," Journal of Accounting and Economics, Elsevier, vol. 60(2), pages 110-135.
    30. Sun, Feifan & Yin, Chen & Zhou, Sili & Zhu, Zijing, 2022. "IPO underpricing and mutual fund allocation: New evidence from registration system," International Review of Financial Analysis, Elsevier, vol. 84(C).
    31. Nerissa C. Brown & Richard M. Crowley & W. Brooke Elliott, 2020. "What Are You Saying? Using topic to Detect Financial Misreporting," Journal of Accounting Research, John Wiley & Sons, Ltd., vol. 58(1), pages 237-291, March.
    32. Li, Feng, 2008. "Annual report readability, current earnings, and earnings persistence," Journal of Accounting and Economics, Elsevier, vol. 45(2-3), pages 221-247, August.
    33. Hanauer, Matthias X. & Kalsbach, Tobias, 2023. "Machine learning and the cross-section of emerging market stock returns," Emerging Markets Review, Elsevier, vol. 55(C).
    34. Sunyang Hu & Yifeng Wang, 2023. "Quality of Financial Information Disclosure and Efficiency of Resource Allocation Under Dual-Track System: Empirical Evidence of Registration System Reform in China," Emerging Markets Finance and Trade, Taylor & Francis Journals, vol. 59(11), pages 3438-3467, September.
    35. Li, Jingyu & Li, Jianping & Zhu, Xiaoqian, 2020. "Risk dependence between energy corporations: A text-based measurement approach," International Review of Economics & Finance, Elsevier, vol. 68(C), pages 33-46.
    36. Xu, Zhiwei & Hua, Xia & Zhang, Teng, 2025. "Does official media sentiment matter for the stock market? Evidence from China," Emerging Markets Review, Elsevier, vol. 64(C).
    37. Jaeho Choi & Anoop Menon & Haris Tabakovic, 2021. "Using machine learning to revisit the diversification–performance relationship," Strategic Management Journal, Wiley Blackwell, vol. 42(9), pages 1632-1661, September.
    38. Tsai, Ming-Feng & Wang, Chuan-Ju, 2017. "On the risk prediction and analysis of soft information in finance reports," European Journal of Operational Research, Elsevier, vol. 257(1), pages 243-250.
    39. Shihao Gu & Bryan Kelly & Dacheng Xiu, 2020. "Empirical Asset Pricing via Machine Learning," The Review of Financial Studies, Society for Financial Studies, vol. 33(5), pages 2223-2273.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Hoang, Daniel & Wiegratz, Kevin, 2022. "Machine learning methods in finance: Recent applications and prospects," Working Paper Series in Economics 158, Karlsruhe Institute of Technology (KIT), Department of Economics and Management.
    2. Alex Kim & Maximilian Muhn & Valeri Nikolaev, 2023. "Bloated Disclosures: Can ChatGPT Help Investors Process Information?," Papers 2306.10224, arXiv.org, revised Oct 2025.
    3. Fengler, Matthias R. & Phan, Tri Minh, 2025. "Unveiling themes in 10-K disclosures: A new topic modeling perspective," International Review of Financial Analysis, Elsevier, vol. 103(C).
    4. Fengler, Matthias & Phan, Minh Tri, 2023. "A Topic Model for 10-K Management Disclosures," Economics Working Paper Series 2307, University of St. Gallen, School of Economics and Political Science.
    5. Yang ZHANG & Ziang QIU Ziang & Donghyun PARK & Shu TIAN, 2026. "Role of Artificial Intelligence in Finance: Selective Literature Review and Implications for Asia's Financial Stability," Working Papers wp61, South East Asian Central Banks (SEACEN) Research and Training Centre, revised Feb 2026.
    6. Blankespoor, Elizabeth & deHaan, Ed & Marinovic, Iván, 2020. "Disclosure processing costs, investors’ information choice, and equity market outcomes: A review," Journal of Accounting and Economics, Elsevier, vol. 70(2).
    7. Olga Bogachek & Antonio De Vito & Paul Demeré & Francesco Grossetti, 2026. "Using narrative disclosures to predict tax outcomes," Review of Accounting Studies, Springer, vol. 31(1), pages 374-412, March.
    8. Evangelos Liaras & Michail Nerantzidis & Antonios Alexandridis, 2024. "Machine learning in accounting and finance research: a literature review," Review of Quantitative Finance and Accounting, Springer, vol. 63(4), pages 1431-1471, November.
    9. Tri Minh Phan, 2024. "Sentiment-semantic word vectors: A new method to estimate management sentiment," Swiss Journal of Economics and Statistics, Springer;Swiss Society of Economics and Statistics, vol. 160(1), pages 1-22, December.
    10. Richard Frankel & Jared Jennings & Joshua Lee, 2022. "Disclosure Sentiment: Machine Learning vs. Dictionary Methods," Management Science, INFORMS, vol. 68(7), pages 5514-5532, July.
    11. Xiaoqian Zhu & Huidong Wu & Yanpeng Chang & Jianping Li, 2025. "Accounting fraud detection through textual risk disclosures in annual reports: From the perspective of SEC guidelines," Accounting and Finance, Accounting and Finance Association of Australia and New Zealand, vol. 65(2), pages 1837-1862, June.
    12. Wang, Sumingyue & Wang, Xinlu & Xu, Liang, 2023. "Debt maturity structure and the quality of risk disclosures," Journal of Corporate Finance, Elsevier, vol. 83(C).
    13. Albrecht, Disen Huang & Anantharaman, Divya & Zhao, Keyi, 2025. "Is a picture worth a thousand words? Image usage in ESG reports," Accounting, Organizations and Society, Elsevier, vol. 115(C).
    14. Nerissa C. Brown & Richard M. Crowley & W. Brooke Elliott, 2020. "What Are You Saying? Using topic to Detect Financial Misreporting," Journal of Accounting Research, John Wiley & Sons, Ltd., vol. 58(1), pages 237-291, March.
    15. Dan Palmon & Yifei Chen & Biao Chen, 2024. "Corporate Social Responsibility and Information Asymmetry: Do Earnings Conference Calls Play a Role?," Journal of Business Ethics, Springer, vol. 194(1), pages 77-101, September.
    16. Brian J. Bushee & Ian D. Gow & Daniel J. Taylor, 2018. "Linguistic Complexity in Firm Disclosures: Obfuscation or Information?," Journal of Accounting Research, John Wiley & Sons, Ltd., vol. 56(1), pages 85-121, March.
    17. Wenyao Hu & Thomas Shohfi & Runzu Wang, 2021. "What’s really in a deal? Evidence from textual analysis of M&A conference calls," Review of Financial Economics, John Wiley & Sons, vol. 39(4), pages 500-521, October.
    18. Li, Jing & Li, Nan & Xia, Tongshui & Guo, Jinjin, 2023. "Textual analysis and detection of financial fraud: Evidence from Chinese manufacturing firms," Economic Modelling, Elsevier, vol. 126(C).
    19. Wei, Lu & Wei, Mingye & Wu, Yifei & Jing, Zhongbo, 2025. "Does party organization construction improve chinese banks' stability? Evidence from a new textual index," China Economic Review, Elsevier, vol. 94(PB).
    20. Han, Chen & Wu, Chengliang & Wei, Lu, 2023. "The impact of the disclosure characteristics of the application material on the successful listing of companies on China’s Science and Technology Innovation Board," Journal of Behavioral and Experimental Finance, Elsevier, vol. 37(C).

    More about this item

    Keywords

    ;
    ;
    ;
    ;
    ;

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:ememar:v:68:y:2025:i:c:s1566014125000925. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.elsevier.com/locate/inca/620356 .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.