IDEAS home Printed from https://ideas.repec.org/a/eee/empfin/v79y2024ics0927539824000732.html
   My bibliography  Save this article

Pooling and winsorizing machine learning forecasts to predict stock returns with high-dimensional data

Author

Listed:
  • Mekelburg, Erik
  • Strauss, Jack

Abstract

We evaluate US market return predictability using a novel data set of several hundred ag- gregated firm-level characteristics. We apply LASSO, Elastic Net, Random Forest, Neural Net, Extreme Gradient Boosting, and Light Gradient Boosting Machine methods and find these models experience large prediction errors that lead to forecast failures. However, winsorizing and pooling machine learning model forecasts provides consistent out-of-sample predictability. To assess robustness, we apply machine learning methods to high-dimensional data for Canada, China, Germany and the UK as well as the Goyal–Welch data. All machine learning models we consider, except for the ensemble pooled methods, fail to significantly predict returns across our samples, highlighting the importance of pooling, evaluating additional economies, and the fragility of individual machine learning methods. Our results shed light on the sparsity versus density debate as the degree of sparsity and variable importance evolves over time.

Suggested Citation

  • Mekelburg, Erik & Strauss, Jack, 2024. "Pooling and winsorizing machine learning forecasts to predict stock returns with high-dimensional data," Journal of Empirical Finance, Elsevier, vol. 79(C).
  • Handle: RePEc:eee:empfin:v:79:y:2024:i:c:s0927539824000732
    DOI: 10.1016/j.jempfin.2024.101538
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S0927539824000732
    Download Restriction: Full text for ScienceDirect subscribers only

    File URL: https://libkey.io/10.1016/j.jempfin.2024.101538?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Ivo Welch & Amit Goyal, 2008. "A Comprehensive Look at The Empirical Performance of Equity Premium Prediction," The Review of Financial Studies, Society for Financial Studies, vol. 21(4), pages 1455-1508, July.
    2. Domenico Giannone & Michele Lenza & Giorgio E. Primiceri, 2021. "Economic Predictions With Big Data: The Illusion of Sparsity," Econometrica, Econometric Society, vol. 89(5), pages 2409-2437, September.
    3. Nima Nonejad, 2021. "An Overview Of Dynamic Model Averaging Techniques In Time‐Series Econometrics," Journal of Economic Surveys, Wiley Blackwell, vol. 35(2), pages 566-614, April.
    4. Guanhao Feng & Stefano Giglio & Dacheng Xiu, 2020. "Taming the Factor Zoo: A Test of New Factors," Journal of Finance, American Finance Association, vol. 75(3), pages 1327-1370, June.
    5. Xi Dong & Yan Li & David E. Rapach & Guofu Zhou, 2022. "Anomalies and the Expected Market Return," Journal of Finance, American Finance Association, vol. 77(1), pages 639-681, February.
    6. Barbara Rossi, 2021. "Forecasting in the Presence of Instabilities: How We Know Whether Models Predict Well and How to Improve Them," Journal of Economic Literature, American Economic Association, vol. 59(4), pages 1135-1190, December.
    7. Breitung, Jörg & Eickmeier, Sandra, 2011. "Testing for structural breaks in dynamic factor models," Journal of Econometrics, Elsevier, vol. 163(1), pages 71-84, July.
    8. Eugene F. Fama & Kenneth R. French, 2008. "Dissecting Anomalies," Journal of Finance, American Finance Association, vol. 63(4), pages 1653-1678, August.
    9. Peter R. Hansen & Asger Lunde & James M. Nason, 2011. "The Model Confidence Set," Econometrica, Econometric Society, vol. 79(2), pages 453-497, March.
    10. Chen, Liang & Dolado, Juan J. & Gonzalo, Jesús, 2014. "Detecting big structural breaks in large factor models," Journal of Econometrics, Elsevier, vol. 180(1), pages 30-48.
    11. Claeskens, Gerda & Magnus, Jan R. & Vasnev, Andrey L. & Wang, Wendun, 2016. "The forecast combination puzzle: A simple theoretical explanation," International Journal of Forecasting, Elsevier, vol. 32(3), pages 754-762.
    12. Montero-Manso, Pablo & Athanasopoulos, George & Hyndman, Rob J. & Talagala, Thiyanga S., 2020. "FFORMA: Feature-based forecast model averaging," International Journal of Forecasting, Elsevier, vol. 36(1), pages 86-92.
    13. Mengxi He & Xianfeng Hao & Yaojie Zhang & Fanyi Meng, 2021. "Forecasting stock return volatility using a robust regression model," Journal of Forecasting, John Wiley & Sons, Ltd., vol. 40(8), pages 1463-1478, December.
    14. Clark, Todd E. & West, Kenneth D., 2007. "Approximately normal tests for equal predictive accuracy in nested models," Journal of Econometrics, Elsevier, vol. 138(1), pages 291-311, May.
    15. Hyndman, Rob J., 2020. "A brief history of forecasting competitions," International Journal of Forecasting, Elsevier, vol. 36(1), pages 7-14.
    16. Quefeng Li & Guang Cheng & Jianqing Fan & Yuyan Wang, 2018. "Embracing the Blessing of Dimensionality in Factor Models," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 113(521), pages 380-389, January.
    17. Leland E. Farmer & Lawrence Schmidt & Allan Timmermann, 2023. "Pockets of Predictability," Journal of Finance, American Finance Association, vol. 78(3), pages 1279-1341, June.
    18. Graham Elliott & Allan Timmermann, 2016. "Forecasting in Economics and Finance," Annual Review of Economics, Annual Reviews, vol. 8(1), pages 81-110, October.
    19. Gary Koop & Dimitris Korobilis, 2012. "Forecasting Inflation Using Dynamic Model Averaging," International Economic Review, Department of Economics, University of Pennsylvania and Osaka University Institute of Social and Economic Research Association, vol. 53(3), pages 867-886, August.
    20. Kozak, Serhiy & Nagel, Stefan & Santosh, Shrihari, 2020. "Shrinking the cross-section," Journal of Financial Economics, Elsevier, vol. 135(2), pages 271-292.
    21. Kewei Hou & Chen Xue & Lu Zhang, 2015. "Editor's Choice Digesting Anomalies: An Investment Approach," The Review of Financial Studies, Society for Financial Studies, vol. 28(3), pages 650-705.
    22. Shihao Gu & Bryan Kelly & Dacheng Xiu, 2020. "Empirical Asset Pricing via Machine Learning," The Review of Financial Studies, Society for Financial Studies, vol. 33(5), pages 2223-2273.
    23. Fama, Eugene F. & French, Kenneth R., 2015. "A five-factor asset pricing model," Journal of Financial Economics, Elsevier, vol. 116(1), pages 1-22.
    24. Hai Lin & Chunchi Wu & Guofu Zhou, 2018. "Forecasting Corporate Bond Returns with a Large Set of Predictors: An Iterated Combination Approach," Management Science, INFORMS, vol. 64(9), pages 4218-4238, September.
    25. Friedman, Jerome H., 2002. "Stochastic gradient boosting," Computational Statistics & Data Analysis, Elsevier, vol. 38(4), pages 367-378, February.
    26. David E. Rapach & Jack K. Strauss & Guofu Zhou, 2013. "International Stock Return Predictability: What Is the Role of the United States?," Journal of Finance, American Finance Association, vol. 68(4), pages 1633-1662, August.
    27. David E. Rapach & Jack K. Strauss & Guofu Zhou, 2010. "Out-of-Sample Equity Premium Prediction: Combination Forecasts and Links to the Real Economy," The Review of Financial Studies, Society for Financial Studies, vol. 23(2), pages 821-862, February.
    28. Francis X. Diebold & Peter Pauly, 1987. "Structural change and the combination of forecasts," Journal of Forecasting, John Wiley & Sons, Ltd., vol. 6(1), pages 21-40.
    29. Pesaran, M. Hashem & Timmermann, Allan, 2007. "Selection of estimation window in the presence of breaks," Journal of Econometrics, Elsevier, vol. 137(1), pages 134-161, March.
    30. David E. Rapach & Jack K. Strauss, 2008. "Forecasting US employment growth using forecast combining methods," Journal of Forecasting, John Wiley & Sons, Ltd., vol. 27(1), pages 75-93.
    31. Jose, Victor Richmond R. & Winkler, Robert L., 2008. "Simple robust averages of forecasts: Some empirical results," International Journal of Forecasting, Elsevier, vol. 24(1), pages 163-169.
    32. Zou, Hui, 2006. "The Adaptive Lasso and Its Oracle Properties," Journal of the American Statistical Association, American Statistical Association, vol. 101, pages 1418-1429, December.
    33. Raffaella Giacomini & Halbert White, 2006. "Tests of Conditional Predictive Ability," Econometrica, Econometric Society, vol. 74(6), pages 1545-1578, November.
    34. Shihao Gu & Bryan Kelly & Dacheng Xiu, 2020. "Empirical Asset Pricing via Machine Learning," Review of Finance, European Finance Association, vol. 33(5), pages 2223-2273.
    35. Hongwei Zhang & Qiang He & Ben Jacobsen & Fuwei Jiang, 2020. "Forecasting stock returns with model uncertainty and parameter instability," Journal of Applied Econometrics, John Wiley & Sons, Ltd., vol. 35(5), pages 629-644, August.
    36. Diebold, Francis X & Mariano, Roberto S, 2002. "Comparing Predictive Accuracy," Journal of Business & Economic Statistics, American Statistical Association, vol. 20(1), pages 134-144, January.
    37. Timmermann, Allan, 2008. "Reply to the discussion of Elusive Return Predictability," International Journal of Forecasting, Elsevier, vol. 24(1), pages 29-30.
    38. Hao, Xianfeng & Zhao, Yuyang & Wang, Yudong, 2020. "Forecasting the real prices of crude oil using robust regression models with regularization constraints," Energy Economics, Elsevier, vol. 86(C).
    39. Pesaran, M. Hashem & Pick, Andreas, 2011. "Forecast Combination Across Estimation Windows," Journal of Business & Economic Statistics, American Statistical Association, vol. 29(2), pages 307-318.
    40. James H. Stock & Mark W. Watson, 1998. "A Comparison of Linear and Nonlinear Univariate Models for Forecasting Macroeconomic Time Series," NBER Working Papers 6607, National Bureau of Economic Research, Inc.
    41. Clemen, Robert T., 1989. "Combining forecasts: A review and annotated bibliography," International Journal of Forecasting, Elsevier, vol. 5(4), pages 559-583.
    42. Hui Zou & Trevor Hastie, 2005. "Addendum: Regularization and variable selection via the elastic net," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 67(5), pages 768-768, November.
    43. Jeremy Smith & Kenneth F. Wallis, 2009. "A Simple Explanation of the Forecast Combination Puzzle," Oxford Bulletin of Economics and Statistics, Department of Economics, University of Oxford, vol. 71(3), pages 331-355, June.
    44. Mark W. Watson & James H. Stock, 2004. "Combination forecasts of output growth in a seven-country data set," Journal of Forecasting, John Wiley & Sons, Ltd., vol. 23(6), pages 405-430.
    45. Timmermann, Allan, 2008. "Elusive return predictability," International Journal of Forecasting, Elsevier, vol. 24(1), pages 1-18.
    46. Hui Zou & Trevor Hastie, 2005. "Regularization and variable selection via the elastic net," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 67(2), pages 301-320, April.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Bennett, Donyetta & Mekelburg, Erik & Strauss, Jack & Williams, T.H., 2024. "Unlocking the black box of sentiment and cryptocurrency: What, which, why, when and how?," Global Finance Journal, Elsevier, vol. 60(C).
    2. Petropoulos, Fotios & Apiletti, Daniele & Assimakopoulos, Vassilios & Babai, Mohamed Zied & Barrow, Devon K. & Ben Taieb, Souhaib & Bergmeir, Christoph & Bessa, Ricardo J. & Bijak, Jakub & Boylan, Joh, 2022. "Forecasting: theory and practice," International Journal of Forecasting, Elsevier, vol. 38(3), pages 705-871.
      • Fotios Petropoulos & Daniele Apiletti & Vassilios Assimakopoulos & Mohamed Zied Babai & Devon K. Barrow & Souhaib Ben Taieb & Christoph Bergmeir & Ricardo J. Bessa & Jakub Bijak & John E. Boylan & Jet, 2020. "Forecasting: theory and practice," Papers 2012.03854, arXiv.org, revised Jan 2022.
    3. Barbara Rossi, 2021. "Forecasting in the Presence of Instabilities: How We Know Whether Models Predict Well and How to Improve Them," Journal of Economic Literature, American Economic Association, vol. 59(4), pages 1135-1190, December.
    4. Rossi, Barbara, 2013. "Advances in Forecasting under Instability," Handbook of Economic Forecasting, in: G. Elliott & C. Granger & A. Timmermann (ed.), Handbook of Economic Forecasting, edition 1, volume 2, chapter 0, pages 1203-1324, Elsevier.
    5. Wang, Xiaoqian & Hyndman, Rob J. & Li, Feng & Kang, Yanfei, 2023. "Forecast combinations: An over 50-year review," International Journal of Forecasting, Elsevier, vol. 39(4), pages 1518-1547.
    6. Lu, Xinjie & Ma, Feng & Xu, Jin & Zhang, Zehui, 2022. "Oil futures volatility predictability: New evidence based on machine learning models11All the authors contribute to the paper equally," International Review of Financial Analysis, Elsevier, vol. 83(C).
    7. Xue Gong & Weiguo Zhang & Yuan Zhao & Xin Ye, 2023. "Forecasting stock volatility with a large set of predictors: A new forecast combination method," Journal of Forecasting, John Wiley & Sons, Ltd., vol. 42(7), pages 1622-1647, November.
    8. Zhang, Hongwei & Zhao, Xinyi & Gao, Wang & Niu, Zibo, 2023. "The role of higher moments in predicting China's oil futures volatility: Evidence from machine learning models," Journal of Commodity Markets, Elsevier, vol. 32(C).
    9. Thomadakis, Apostolos, 2016. "Do Combination Forecasts Outperform the Historical Average? Economic and Statistical Evidence," MPRA Paper 71589, University Library of Munich, Germany.
    10. Wang, Yudong & Hao, Xianfeng, 2022. "Forecasting the real prices of crude oil: A robust weighted least squares approach," Energy Economics, Elsevier, vol. 116(C).
    11. Kuppenheimer, Gregory & Shelly, Stuart & Strauss, Jack, 2023. "Can machine learning identify sector-level financial ratios that predict sector returns?," Finance Research Letters, Elsevier, vol. 57(C).
    12. Cakici, Nusret & Fieberg, Christian & Metko, Daniel & Zaremba, Adam, 2023. "Machine learning goes global: Cross-sectional return predictability in international stock markets," Journal of Economic Dynamics and Control, Elsevier, vol. 155(C).
    13. Nusret Cakici & Christian Fieberg & Daniel Metko & Adam Zaremba, 2024. "Do Anomalies Really Predict Market Returns? New Data and New Evidence," Review of Finance, European Finance Association, vol. 28(1), pages 1-44.
    14. Niu, Zibo & Wang, Chenlu & Zhang, Hongwei, 2023. "Forecasting stock market volatility with various geopolitical risks categories: New evidence from machine learning models," International Review of Financial Analysis, Elsevier, vol. 89(C).
    15. Wang, Yudong & Hao, Xianfeng, 2023. "Forecasting the real prices of crude oil: What is the role of parameter instability?," Energy Economics, Elsevier, vol. 117(C).
    16. Díaz, Juan D. & Hansen, Erwin & Cabrera, Gabriel, 2024. "Machine-learning stock market volatility: Predictability, drivers, and economic value," International Review of Financial Analysis, Elsevier, vol. 94(C).
    17. Xi Dong & Yan Li & David E. Rapach & Guofu Zhou, 2022. "Anomalies and the Expected Market Return," Journal of Finance, American Finance Association, vol. 77(1), pages 639-681, February.
    18. Lin, Hai & Tao, Xinyuan & Wu, Chunchi, 2022. "Forecasting earnings with combination of analyst forecasts," Journal of Empirical Finance, Elsevier, vol. 68(C), pages 133-159.
    19. Smith, Simon C., 2021. "International stock return predictability," International Review of Financial Analysis, Elsevier, vol. 78(C).
    20. Zhikai Zhang & Yaojie Zhang & Yudong Wang, 2024. "Forecasting the equity premium using weighted regressions: Does the jump variation help?," Empirical Economics, Springer, vol. 66(5), pages 2049-2082, May.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:empfin:v:79:y:2024:i:c:s0927539824000732. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.elsevier.com/locate/jempfin .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.