IDEAS home Printed from https://ideas.repec.org/p/arx/papers/2103.00366.html
   My bibliography  Save this paper

Confronting Machine Learning With Financial Research

Author

Listed:
  • Kristof Lommers
  • Ouns El Harzli
  • Jack Kim

Abstract

This study aims to examine the challenges and applications of machine learning for financial research. Machine learning algorithms have been developed for certain data environments which substantially differ from the one we encounter in finance. Not only do difficulties arise due to some of the idiosyncrasies of financial markets, there is a fundamental tension between the underlying paradigm of machine learning and the research philosophy in financial economics. Given the peculiar features of financial markets and the empirical framework within social science, various adjustments have to be made to the conventional machine learning methodology. We discuss some of the main challenges of machine learning in finance and examine how these could be accounted for. Despite some of the challenges, we argue that machine learning could be unified with financial research to become a robust complement to the econometrician's toolbox. Moreover, we discuss the various applications of machine learning in the research process such as estimation, empirical discovery, testing, causal inference and prediction.

Suggested Citation

  • Kristof Lommers & Ouns El Harzli & Jack Kim, 2021. "Confronting Machine Learning With Financial Research," Papers 2103.00366, arXiv.org, revised Mar 2021.
  • Handle: RePEc:arx:papers:2103.00366
    as

    Download full text from publisher

    File URL: http://arxiv.org/pdf/2103.00366
    File Function: Latest version
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Crone, Sven F. & Hibon, Michèle & Nikolopoulos, Konstantinos, 2011. "Advances in forecasting with neural networks? Empirical evidence from the NN3 competition on time series prediction," International Journal of Forecasting, Elsevier, vol. 27(3), pages 635-660.
    2. Mukund Sundararajan & Amir Najmi, 2019. "The many Shapley values for model explanation," Papers 1908.08474, arXiv.org, revised Feb 2020.
    3. Luyang Chen & Markus Pelger & Jason Zhu, 2019. "Deep Learning in Asset Pricing," Papers 1904.00745, arXiv.org, revised Aug 2021.
    4. Shihao Gu & Bryan Kelly & Dacheng Xiu, 2020. "Empirical Asset Pricing via Machine Learning," Review of Finance, European Finance Association, vol. 33(5), pages 2223-2273.
    5. R. David Mclean & Jeffrey Pontiff, 2016. "Does Academic Research Destroy Stock Return Predictability?," Journal of Finance, American Finance Association, vol. 71(1), pages 5-32, February.
    6. Gianluca Bontempi & Souhaib Ben Taieb & Yann-Aël Le Borgne, 2013. "Machine learning strategies for time series forecasting," ULB Institutional Repository 2013/167761, ULB -- Universite Libre de Bruxelles.
    7. Susan Athey & Guido W. Imbens, 2017. "The State of Applied Econometrics: Causality and Policy Evaluation," Journal of Economic Perspectives, American Economic Association, vol. 31(2), pages 3-32, Spring.
    8. Massimo Guidolin, 2011. "Markov Switching Models in Empirical Finance," Advances in Econometrics, in: Missing Data Methods: Time-Series Methods and Applications, pages 1-86, Emerald Group Publishing Limited.
    9. Justin A. Sirignano, 2019. "Deep learning for limit order books," Quantitative Finance, Taylor & Francis Journals, vol. 19(4), pages 549-570, April.
    10. Shihao Gu & Bryan Kelly & Dacheng Xiu, 2020. "Empirical Asset Pricing via Machine Learning," The Review of Financial Studies, Society for Financial Studies, vol. 33(5), pages 2223-2273.
    11. Sendhil Mullainathan & Jann Spiess, 2017. "Machine Learning: An Applied Econometric Approach," Journal of Economic Perspectives, American Economic Association, vol. 31(2), pages 87-106, Spring.
    12. Spyros Makridakis & Evangelos Spiliotis & Vassilios Assimakopoulos, 2018. "Statistical and Machine Learning forecasting methods: Concerns and ways forward," PLOS ONE, Public Library of Science, vol. 13(3), pages 1-26, March.
    13. Patrick C. Higgins, 2014. "GDPNow: A Model for GDP \"Nowcasting\"," FRB Atlanta Working Paper 2014-7, Federal Reserve Bank of Atlanta.
    14. R. Cont, 2001. "Empirical properties of asset returns: stylized facts and statistical issues," Quantitative Finance, Taylor & Francis Journals, vol. 1(2), pages 223-236.
    15. Sean J. Taylor & Benjamin Letham, 2018. "Forecasting at Scale," The American Statistician, Taylor & Francis Journals, vol. 72(1), pages 37-45, January.
    16. Jianqing Fan & Jinchi Lv & Lei Qi, 2011. "Sparse High-Dimensional Models in Economics," Annual Review of Economics, Annual Reviews, vol. 3(1), pages 291-317, September.
    17. Lu, Xun & White, Halbert, 2014. "Robustness checks and robustness tests in applied economics," Journal of Econometrics, Elsevier, vol. 178(P1), pages 194-206.
    18. Martin Leo & Suneel Sharma & K. Maddulety, 2019. "Machine Learning in Banking Risk Management: A Literature Review," Risks, MDPI, vol. 7(1), pages 1-22, March.
    19. Susan Athey & Guido W. Imbens, 2019. "Machine Learning Methods That Economists Should Know About," Annual Review of Economics, Annual Reviews, vol. 11(1), pages 685-725, August.
    20. Athey, Susan & Imbens, Guido W., 2019. "Machine Learning Methods Economists Should Know About," Research Papers 3776, Stanford University, Graduate School of Business.
    21. Walter Pohl & Karl Schmedders & Ole Wilms, 2018. "Higher Order Effects in Asset Pricing Models with Long‐Run Risks," Journal of Finance, American Finance Association, vol. 73(3), pages 1061-1111, June.
    22. A. Belloni & D. Chen & V. Chernozhukov & C. Hansen, 2012. "Sparse Models and Methods for Optimal Instruments With an Application to Eminent Domain," Econometrica, Econometric Society, vol. 80(6), pages 2369-2429, November.
    23. Liao, Jui-Jung & Shih, Ching-Hui & Chen, Tai-Feng & Hsu, Ming-Fu, 2014. "An ensemble-based model for two-class imbalanced financial problem," Economic Modelling, Elsevier, vol. 37(C), pages 175-183.
    24. Hal R. Varian, 2014. "Big Data: New Tricks for Econometrics," Journal of Economic Perspectives, American Economic Association, vol. 28(2), pages 3-28, Spring.
    25. Sheikh Rabiul Islam & Sheikh Khaled Ghafoor & William Eberle, 2018. "Mining Illegal Insider Trading of Stocks: A Proactive Approach," Papers 1807.00939, arXiv.org, revised Nov 2018.
    26. Nassim Nicholas Taleb, 2020. "Statistical Consequences of Fat Tails: Real World Preasymptotics, Epistemology, and Applications," Papers 2001.10488, arXiv.org, revised Nov 2022.
    27. Mr. Andrew J Tiffin, 2019. "Machine Learning and Causality: The Impact of Financial Crises on Growth," IMF Working Papers 2019/228, International Monetary Fund.
    28. Campbell R. Harvey, 2017. "Presidential Address: The Scientific Outlook in Financial Economics," Journal of Finance, American Finance Association, vol. 72(4), pages 1399-1440, August.
    29. Mark J. Holmes & Brian Silverstone, 2010. "Business confidence and cyclical turning points: a Markov-switching approach," Applied Economics Letters, Taylor & Francis Journals, vol. 17(3), pages 229-233, February.
    30. Andrius Vabalas & Emma Gowen & Ellen Poliakoff & Alexander J Casson, 2019. "Machine learning algorithm validation with a limited sample size," PLOS ONE, Public Library of Science, vol. 14(11), pages 1-20, November.
    31. Changqing Cheng & Akkarapol Sa-Ngasoongsong & Omer Beyca & Trung Le & Hui Yang & Zhenyu (James) Kong & Satish T.S. Bukkapatnam, 2015. "Time series forecasting for nonlinear and non-stationary processes: a review and comparative study," IISE Transactions, Taylor & Francis Journals, vol. 47(10), pages 1053-1071, October.
    32. Michal Balcerak & Thomas Schmelzer, 2020. "Constructing trading strategy ensembles by classifying market states," Papers 2012.03078, arXiv.org.
    33. Adriano Koshiyama & Nick Firoozye, 2019. "Avoiding Backtesting Overfitting by Covariance-Penalties: an empirical investigation of the ordinary and total least squares cases," Papers 1905.05023, arXiv.org.
    34. Dernoncourt, David & Hanczar, Blaise & Zucker, Jean-Daniel, 2014. "Analysis of feature selection stability on high dimension and small sample data," Computational Statistics & Data Analysis, Elsevier, vol. 71(C), pages 681-693.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Jos'e-Manuel Pe~na & Fernando Su'arez & Omar Larr'e & Domingo Ram'irez & Arturo Cifuentes, 2023. "A Modified CTGAN-Plus-Features Based Method for Optimal Asset Allocation," Papers 2302.02269, arXiv.org, revised Feb 2023.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Petropoulos, Fotios & Apiletti, Daniele & Assimakopoulos, Vassilios & Babai, Mohamed Zied & Barrow, Devon K. & Ben Taieb, Souhaib & Bergmeir, Christoph & Bessa, Ricardo J. & Bijak, Jakub & Boylan, Joh, 2022. "Forecasting: theory and practice," International Journal of Forecasting, Elsevier, vol. 38(3), pages 705-871.
      • Fotios Petropoulos & Daniele Apiletti & Vassilios Assimakopoulos & Mohamed Zied Babai & Devon K. Barrow & Souhaib Ben Taieb & Christoph Bergmeir & Ricardo J. Bessa & Jakub Bijak & John E. Boylan & Jet, 2020. "Forecasting: theory and practice," Papers 2012.03854, arXiv.org, revised Jan 2022.
    2. Yulin Liu & Luyao Zhang, 2022. "Cryptocurrency Valuation: An Explainable AI Approach," Papers 2201.12893, arXiv.org, revised Jul 2023.
    3. Hoang, Daniel & Wiegratz, Kevin, 2022. "Machine learning methods in finance: Recent applications and prospects," Working Paper Series in Economics 158, Karlsruhe Institute of Technology (KIT), Department of Economics and Management.
    4. Sophie-Charlotte Klose & Johannes Lederer, 2020. "A Pipeline for Variable Selection and False Discovery Rate Control With an Application in Labor Economics," Papers 2006.12296, arXiv.org, revised Jun 2020.
    5. Tsang, Andrew, 2021. "Uncovering Heterogeneous Regional Impacts of Chinese Monetary Policy," MPRA Paper 110703, University Library of Munich, Germany.
    6. Rubesam, Alexandre, 2022. "Machine learning portfolios with equal risk contributions: Evidence from the Brazilian market," Emerging Markets Review, Elsevier, vol. 51(PB).
    7. Colak, Gonul & Fu, Mengchuan & Hasan, Iftekhar, 2022. "On modeling IPO failure risk," Economic Modelling, Elsevier, vol. 109(C).
    8. Hanauer, Matthias X. & Kononova, Marina & Rapp, Marc Steffen, 2022. "Boosting agnostic fundamental analysis: Using machine learning to identify mispricing in European stock markets," Finance Research Letters, Elsevier, vol. 48(C).
    9. Matthew A. Cole & Robert J R Elliott & Bowen Liu, 2020. "The Impact of the Wuhan Covid-19 Lockdown on Air Pollution and Health: A Machine Learning and Augmented Synthetic Control Approach," Environmental & Resource Economics, Springer;European Association of Environmental and Resource Economists, vol. 76(4), pages 553-580, August.
    10. Tatiana de Macedo Nogueira Lima, 2022. "Documento de Trabalho 03/2022 - Aprendizado de máquina e antitruste," Documentos de Trabalho 2022030, Conselho Administrativo de Defesa Econômica (Cade), Departamento de Estudos Econômicos.
    11. de Blasio, Guido & D'Ignazio, Alessio & Letta, Marco, 2022. "Gotham city. Predicting ‘corrupted’ municipalities with machine learning," Technological Forecasting and Social Change, Elsevier, vol. 184(C).
    12. Andrew J. Patton & Yasin Simsek, 2023. "Generalized Autoregressive Score Trees and Forests," Papers 2305.18991, arXiv.org.
    13. Georges, Christophre & Pereira, Javier, 2021. "Market stability with machine learning agents," Journal of Economic Dynamics and Control, Elsevier, vol. 122(C).
    14. Mehmet Güney Celbiş, 2021. "A machine learning approach to rural entrepreneurship," Papers in Regional Science, Wiley Blackwell, vol. 100(4), pages 1079-1104, August.
    15. Jozef Barunik & Lubos Hanus, 2022. "Learning Probability Distributions in Macroeconomics and Finance," Papers 2204.06848, arXiv.org.
    16. Mehmet Güney Celbiş & Pui‐hang Wong & Karima Kourtit & Peter Nijkamp, 2023. "Impacts of the COVID‐19 outbreak on older‐age cohorts in European Labor Markets: A machine learning exploration of vulnerable groups," Regional Science Policy & Practice, Wiley Blackwell, vol. 15(3), pages 559-584, April.
    17. Daniel Wochner, 2020. "Dynamic Factor Trees and Forests – A Theory-led Machine Learning Framework for Non-Linear and State-Dependent Short-Term U.S. GDP Growth Predictions," KOF Working papers 20-472, KOF Swiss Economic Institute, ETH Zurich.
    18. Guido de Blasio & Alessio D'Ignazio & Marco Letta, 2020. "Predicting Corruption Crimes with Machine Learning. A Study for the Italian Municipalities," Working Papers 16/20, Sapienza University of Rome, DISS.
    19. Doumpos, Michalis & Zopounidis, Constantin & Gounopoulos, Dimitrios & Platanakis, Emmanouil & Zhang, Wenke, 2023. "Operational research and artificial intelligence methods in banking," European Journal of Operational Research, Elsevier, vol. 306(1), pages 1-16.
    20. Christian Fieberg & Daniel Metko & Thorsten Poddig & Thomas Loy, 2023. "Machine learning techniques for cross-sectional equity returns’ prediction," OR Spectrum: Quantitative Approaches in Management, Springer;Gesellschaft für Operations Research e.V., vol. 45(1), pages 289-323, March.

    More about this item

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:arx:papers:2103.00366. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: arXiv administrators (email available below). General contact details of provider: http://arxiv.org/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.