IDEAS home Printed from https://ideas.repec.org/p/arx/papers/2601.07131.html

The Limits of Complexity: Why Feature Engineering Beats Deep Learning in Investor Flow Prediction

Author

Listed:
  • Sungwoo Kang

Abstract

The application of machine learning to financial prediction has accelerated dramatically, yet the conditions under which complex models outperform simple alternatives remain poorly understood. This paper investigates whether advanced signal processing and deep learning techniques can extract predictive value from investor order flows beyond what simple feature engineering achieves. Using a comprehensive dataset of 2.79 million observations spanning 2,439 Korean equities from 2020--2024, we apply three methodologies: \textit{Independent Component Analysis} (ICA) to recover latent market drivers, \textit{Wavelet Coherence} analysis to characterize multi-scale correlation structure, and \textit{Long Short-Term Memory} (LSTM) networks with attention mechanisms for non-linear prediction. Our results reveal a striking finding: a parsimonious linear model using market capitalization-normalized flows (``Matched Filter'' preprocessing) achieves a Sharpe ratio of 1.30 and cumulative return of 272.6\%, while the full ICA-Wavelet-LSTM pipeline generates a Sharpe ratio of only 0.07 with a cumulative return of $-5.1\%$. The raw LSTM model collapsed to predicting the unconditional mean, achieving a hit rate of 47.5\% -- worse than random. We conclude that in low signal-to-noise financial environments, domain-specific feature engineering yields substantially higher marginal returns than algorithmic complexity. These findings establish important boundary conditions for the application of deep learning to financial prediction.

Suggested Citation

  • Sungwoo Kang, 2026. "The Limits of Complexity: Why Feature Engineering Beats Deep Learning in Investor Flow Prediction," Papers 2601.07131, arXiv.org.
  • Handle: RePEc:arx:papers:2601.07131
    as

    Download full text from publisher

    File URL: http://arxiv.org/pdf/2601.07131
    File Function: Latest version
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Luyang Chen & Markus Pelger & Jason Zhu, 2024. "Deep Learning in Asset Pricing," Management Science, INFORMS, vol. 70(2), pages 714-750, February.
    2. Fama, Eugene F, 1970. "Efficient Capital Markets: A Review of Theory and Empirical Work," Journal of Finance, American Finance Association, vol. 25(2), pages 383-417, May.
    3. Hasbrouck, Joel, 1991. "Measuring the Information Content of Stock Trades," Journal of Finance, American Finance Association, vol. 46(1), pages 179-207, March.
    4. Richard W. Sias, 2004. "Institutional Herding," The Review of Financial Studies, Society for Financial Studies, vol. 17(1), pages 165-206.
    5. John M. Griffin & Jeffrey H. Harris & Selim Topaloglu, 2003. "The Dynamics of Institutional and Individual Trading," Journal of Finance, American Finance Association, vol. 58(6), pages 2285-2320, December.
    6. Brad M. Barber & Terrance Odean, 2000. "Trading Is Hazardous to Your Wealth: The Common Stock Investment Performance of Individual Investors," Journal of Finance, American Finance Association, vol. 55(2), pages 773-806, April.
    7. Glosten, Lawrence R. & Milgrom, Paul R., 1985. "Bid, ask and transaction prices in a specialist market with heterogeneously informed traders," Journal of Financial Economics, Elsevier, vol. 14(1), pages 71-100, March.
    8. Shihao Gu & Bryan Kelly & Dacheng Xiu, 2020. "Empirical Asset Pricing via Machine Learning," Review of Finance, European Finance Association, vol. 33(5), pages 2223-2273.
    9. R. David Mclean & Jeffrey Pontiff, 2016. "Does Academic Research Destroy Stock Return Predictability?," Journal of Finance, American Finance Association, vol. 71(1), pages 5-32, February.
    10. Kyle, Albert S, 1985. "Continuous Auctions and Insider Trading," Econometrica, Econometric Society, vol. 53(6), pages 1315-1335, November.
    11. Guanhao Feng & Stefano Giglio & Dacheng Xiu, 2020. "Taming the Factor Zoo: A Test of New Factors," Journal of Finance, American Finance Association, vol. 75(3), pages 1327-1370, June.
    12. Gah-Yi Ban & Noureddine El Karoui & Andrew E. B. Lim, 2018. "Machine Learning and Portfolio Optimization," Management Science, INFORMS, vol. 64(3), pages 1136-1154, March.
    13. Amihud, Yakov, 2002. "Illiquidity and stock returns: cross-section and time-series effects," Journal of Financial Markets, Elsevier, vol. 5(1), pages 31-56, January.
    14. Brad M. Barber & Terrance Odean & Ning Zhu, 2009. "Do Retail Trades Move Markets?," The Review of Financial Studies, Society for Financial Studies, vol. 22(1), pages 151-186, January.
    15. Grinblatt, Mark & Keloharju, Matti, 2000. "The investment behavior and performance of various investor types: a study of Finland's unique data set," Journal of Financial Economics, Elsevier, vol. 55(1), pages 43-67, January.
    16. Jeon, Jin Q & Moffett, Clay M., 2010. "Herding by foreign investors and emerging market equity returns: Evidence from Korea," International Review of Economics & Finance, Elsevier, vol. 19(4), pages 698-710, October.
    17. Kim, Woochan & Wei, Shang-Jin, 2002. "Foreign portfolio investors before and during a crisis," Journal of International Economics, Elsevier, vol. 56(1), pages 77-96, January.
    18. Shihao Gu & Bryan Kelly & Dacheng Xiu, 2020. "Empirical Asset Pricing via Machine Learning," The Review of Financial Studies, Society for Financial Studies, vol. 33(5), pages 2223-2273.
    19. Justin Sirignano & Rama Cont, 2019. "Universal features of price formation in financial markets: perspectives from deep learning," Quantitative Finance, Taylor & Francis Journals, vol. 19(9), pages 1449-1459, September.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Campbell, John Y. & Ramadorai, Tarun & Schwartz, Allie, 2009. "Caught on tape: Institutional trading, stock returns, and earnings announcements," Journal of Financial Economics, Elsevier, vol. 92(1), pages 66-91, April.
    2. Choi, Darwin, 2019. "Disposition sales and stock market liquidity," Journal of Financial Markets, Elsevier, vol. 45(C), pages 19-36.
    3. Danny Lo, 2015. "Essays in Market Microstructure and Investor Trading," PhD Thesis, Finance Discipline Group, UTS Business School, University of Technology, Sydney, number 4-2015, January-A.
    4. Blankespoor, Elizabeth & deHaan, Ed & Marinovic, Iván, 2020. "Disclosure processing costs, investors’ information choice, and equity market outcomes: A review," Journal of Accounting and Economics, Elsevier, vol. 70(2).
    5. Pedro M. Mirete-Ferrer & Alberto Garcia-Garcia & Juan Samuel Baixauli-Soler & Maria A. Prats, 2022. "A Review on Machine Learning for Asset Management," Risks, MDPI, vol. 10(4), pages 1-46, April.
    6. Lu, Zhongjin & Malliaris, Steven & Qin, Zhongling, 2023. "Heterogeneous liquidity providers and night-minus-day return predictability," Journal of Financial Economics, Elsevier, vol. 148(3), pages 175-200.
    7. Danny Lo, 2015. "Essays in Market Microstructure and Investor Trading," PhD Thesis, Finance Discipline Group, UTS Business School, University of Technology, Sydney, number 22, July-Dece.
    8. Vincent Bogousslavsky & Vyacheslav Fos & Dmitriy Muravyev, 2024. "Informed Trading Intensity," Journal of Finance, American Finance Association, vol. 79(2), pages 903-948, April.
    9. Thanh Huong Nguyen, 2019. "Information and Noise in Stock Markets: Evidence on the Determinants and Effects Using New Empirical Measures," PhD Thesis, Finance Discipline Group, UTS Business School, University of Technology, Sydney, number 7-2019, January-A.
    10. Stoffman, Noah, 2014. "Who trades with whom? Individuals, institutions, and returns," Journal of Financial Markets, Elsevier, vol. 21(C), pages 50-75.
    11. Agudelo, Diego A. & Byder, James & Yepes-Henao, Paula, 2019. "Performance and informed trading. Comparing foreigners, institutions and individuals in an emerging stock market," Journal of International Money and Finance, Elsevier, vol. 90(C), pages 187-203.
    12. Sungwoo Kang, 2026. "When the Rules Change: Adaptive Signal Extraction via Kalman Filtering and Markov-Switching Regimes," Papers 2601.05716, arXiv.org, revised Feb 2026.
    13. Zhang, Chris H. & Frijns, Bart, 2019. "Noise trading and informational efficiency," EconStor Preprints 198037, ZBW - Leibniz Information Centre for Economics.
    14. Vasios, Michalis & Payne, Richard & Nolte, Ingmar, 2015. "Profiting from Mimicking Strategies in Non-Anonymous Markets," MPRA Paper 61710, University Library of Munich, Germany.
    15. Fotini Economou & Konstantinos Gavriilidis & Bartosz Gebka & Vasileios Kallinterakis, 2022. "Feedback trading: a review of theory and empirical evidence," Review of Behavioral Finance, Emerald Group Publishing Limited, vol. 15(4), pages 429-476, February.
    16. Koesrindartoto, Deddy P. & Aaron, Aurelius & Yusgiantoro, Inka & Dharma, Wirata A. & Arroisi, Abdurrohman, 2020. "Who moves the stock market in an emerging country – Institutional or retail investors?," Research in International Business and Finance, Elsevier, vol. 51(C).
    17. Eghbal Rahimikia & Stefan Zohren & Ser-Huang Poon, 2021. "Realised Volatility Forecasting: Machine Learning via Financial Word Embedding," Papers 2108.00480, arXiv.org, revised Apr 2026.
    18. Paul Handro & Bogdan Dima, 2024. "Analyzing Financial Markets Efficiency: Insights from a Bibliometric and Content Review," Journal of Financial Studies, Institute of Financial Studies, vol. 16(9), pages 119-175, May.
    19. Daniel Dorn & Gur Huberman & Paul Sengmueller, 2008. "Correlated Trading and Returns," Journal of Finance, American Finance Association, vol. 63(2), pages 885-920, April.
    20. Fung, Scott & Obaid, Khaled & Tsai, Shih-Chuan, 2024. "Information acquisition and processing skills of institutions and retail investors around information shocks," Journal of Empirical Finance, Elsevier, vol. 77(C).

    More about this item

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:arx:papers:2601.07131. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: arXiv administrators (email available below). General contact details of provider: http://arxiv.org/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.