IDEAS home Printed from https://ideas.repec.org/a/eee/ejores/v278y2019i1p330-342.html
   My bibliography  Save this article

Large data sets and machine learning: Applications to statistical arbitrage

Author

Listed:
  • Huck, Nicolas

Abstract

Machine learning algorithms and big data are transforming all industries including the finance and portfolio management sectors. While these techniques, such as Deep Belief Networks or Random Forests, are becoming more and more popular on the market, the academic literature is relatively sparse. Through a series of applications involving hundreds of variables/predictors and stocks, this article presents some of the state-of-the-art techniques and how they can be implemented to manage a long-short portfolio. Numerous practical and empirical issues are developed. One of the main questions beyond big data use is the value of information. Does an increase in the number of predictors improve the portfolio performance? Which features are the most important? A large number of predictors means, potentially, a high level of noise. How do the algorithms manage this? This article develops an application using a 22-year trading period, up to 300 U.S. large caps and around 600 predictors. The empirical results underline the ability of these techniques to generate useful trading signals for portfolios with important turnovers and short holding periods (one or five days). Positive excess returns are reported between 1993 and 2008. They are strongly reduced after accounting for transaction costs and traditional risk factors. When these machine learning tools were readily available in the market, excess returns turned into the negative in most recent times. Results also show that adding features is far from being a guarantee to boost the alpha of the portfolio.

Suggested Citation

  • Huck, Nicolas, 2019. "Large data sets and machine learning: Applications to statistical arbitrage," European Journal of Operational Research, Elsevier, vol. 278(1), pages 330-342.
  • Handle: RePEc:eee:ejores:v:278:y:2019:i:1:p:330-342
    DOI: 10.1016/j.ejor.2019.04.013
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S0377221719303339
    Download Restriction: Full text for ScienceDirect subscribers only

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Krauss, Christopher & Do, Xuan Anh & Huck, Nicolas, 2017. "Deep neural networks, gradient-boosted trees, random forests: Statistical arbitrage on the S&P 500," European Journal of Operational Research, Elsevier, vol. 259(2), pages 689-702.
    2. Seddon, Jonathan J.J.M. & Currie, Wendy L., 2017. "A model for unpacking big data analytics in high-frequency trading," Journal of Business Research, Elsevier, vol. 70(C), pages 300-307.
    3. Leung, Mark T. & Daouk, Hazem & Chen, An-Sing, 2000. "Forecasting stock indices: a comparison of classification and level estimation models," International Journal of Forecasting, Elsevier, vol. 16(2), pages 173-190.
    4. Michael C. Jensen, 1968. "The Performance Of Mutual Funds In The Period 1945–1964," Journal of Finance, American Finance Association, vol. 23(2), pages 389-416, May.
    5. Hong, Harrison & Torous, Walter & Valkanov, Rossen, 2007. "Do industries lead stock markets?," Journal of Financial Economics, Elsevier, vol. 83(2), pages 367-396, February.
    6. Fernandes, Marcelo & Medeiros, Marcelo C. & Scharth, Marcel, 2014. "Modeling and predicting the CBOE market volatility index," Journal of Banking & Finance, Elsevier, vol. 40(C), pages 1-10.
    7. Lutz Kilian & Cheolbeom Park, 2009. "The Impact Of Oil Price Shocks On The U.S. Stock Market," International Economic Review, Department of Economics, University of Pennsylvania and Osaka University Institute of Social and Economic Research Association, vol. 50(4), pages 1267-1287, November.
    8. Jonathan Baron & Barbara A. Mellers & Philip E. Tetlock & Eric Stone & Lyle H. Ungar, 2014. "Two Reasons to Make Aggregated Probability Forecasts More Extreme," Decision Analysis, INFORMS, vol. 11(2), pages 133-145, June.
    9. Ville A. Satopää & Robin Pemantle & Lyle H. Ungar, 2016. "Modeling Probability Forecasts via Information Diversity," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 111(516), pages 1623-1633, October.
    10. Michael H. Breitner & Christian Dunis & Hans-Jörg Mettenheim & Christopher Neely & Georgios Sermpinis & Christian Spreckelsen & Hans‐Jörg Mettenheim & Michael H. Breitner, 2014. "Real‐Time Pricing and Hedging of Options on Currency Futures with Artificial Neural Networks," Journal of Forecasting, John Wiley & Sons, Ltd., vol. 33(6), pages 419-432, September.
    11. François Longin & Bruno Solnik, 2001. "Extreme Correlation of International Equity Markets," Journal of Finance, American Finance Association, vol. 56(2), pages 649-676, April.
    12. Jacobs, Heiko, 2015. "What explains the dynamics of 100 anomalies?," Journal of Banking & Finance, Elsevier, vol. 57(C), pages 65-85.
    13. Victor DeMiguel & Lorenzo Garlappi & Raman Uppal, 2009. "Optimal Versus Naive Diversification: How Inefficient is the 1-N Portfolio Strategy?," Review of Financial Studies, Society for Financial Studies, vol. 22(5), pages 1915-1953, May.
    14. Panopoulou, Ekaterini & Vrontos, Spyridon, 2015. "Hedge fund return predictability; To combine forecasts or combine information?," Journal of Banking & Finance, Elsevier, vol. 56(C), pages 103-122.
    15. Jushan Bai & Jianqing Fan & Ruey Tsay, 2016. "Special Issue on Big Data," Journal of Business & Economic Statistics, Taylor & Francis Journals, vol. 34(4), pages 487-488, October.
    16. repec:aea:jecper:v:31:y:2017:i:2:p:87-106 is not listed on IDEAS
    17. Zhao, Yang & Li, Jianping & Yu, Lean, 2017. "A deep learning ensemble approach for crude oil price forecasting," Energy Economics, Elsevier, vol. 66(C), pages 9-16.
    18. Chordia, Tarun & Roll, Richard & Subrahmanyam, Avanidhar, 2011. "Recent trends in trading activity and market quality," Journal of Financial Economics, Elsevier, vol. 101(2), pages 243-263, August.
    19. Peter F. Christoffersen & Francis X. Diebold, 2006. "Financial Asset Returns, Direction-of-Change Forecasting, and Volatility Dynamics," Management Science, INFORMS, vol. 52(8), pages 1273-1287, August.
    20. Fama, Eugene F. & French, Kenneth R., 1993. "Common risk factors in the returns on stocks and bonds," Journal of Financial Economics, Elsevier, vol. 33(1), pages 3-56, February.
    21. Andrew W. Lo, 2010. "Hedge Funds: An Analytic Perspective Updated Edition," Economics Books, Princeton University Press, edition 1, number 9177.
    22. Chen, Zhiwu & Knez, Peter J, 1996. "Portfolio Performance Measurement: Theory and Applications," Review of Financial Studies, Society for Financial Studies, vol. 9(2), pages 511-555.
    23. Sadka, Ronnie, 2010. "Liquidity risk and the cross-section of hedge-fund returns," Journal of Financial Economics, Elsevier, vol. 98(1), pages 54-71, October.
    24. Laopodis, Nikiforos T., 2013. "Monetary policy and stock market dynamics across monetary regimes," Journal of International Money and Finance, Elsevier, vol. 33(C), pages 381-406.
    25. Huck, Nicolas, 2009. "Pairs selection and outranking: An application to the S&P 100 index," European Journal of Operational Research, Elsevier, vol. 196(2), pages 819-825, July.
    26. Jeff Fleming & Chris Kirby & Barbara Ostdiek, 2001. "The Economic Value of Volatility Timing," Journal of Finance, American Finance Association, vol. 56(1), pages 329-352, February.
    27. Carhart, Mark M, 1997. " On Persistence in Mutual Fund Performance," Journal of Finance, American Finance Association, vol. 52(1), pages 57-82, March.
    28. Huck, Nicolas, 2010. "Pairs trading and outranking: The multi-step-ahead forecasting case," European Journal of Operational Research, Elsevier, vol. 207(3), pages 1702-1716, December.
    29. Esfandiar Maasoumi & Marcelo Medeiros, 2010. "The Link Between Statistical Learning Theory and Econometrics: Applications in Economics, Finance, and Marketing," Econometric Reviews, Taylor & Francis Journals, vol. 29(5-6), pages 470-475.
    30. Jegadeesh, Narasimhan, 1990. " Evidence of Predictable Behavior of Security Returns," Journal of Finance, American Finance Association, vol. 45(3), pages 881-898, July.
    31. Jonathan J.J.M. Seddon & Wendy L. Currie, 2017. "A model for unpacking big data analytics in high-frequency trading," Post-Print hal-01404316, HAL.
    32. Hal R. Varian, 2014. "Big Data: New Tricks for Econometrics," Journal of Economic Perspectives, American Economic Association, vol. 28(2), pages 3-28, Spring.
    33. Hui Zou & Trevor Hastie, 2005. "Addendum: Regularization and variable selection via the elastic net," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 67(5), pages 768-768, November.
    34. Gibbons, Michael R & Hess, Patrick, 1981. "Day of the Week Effects and Asset Returns," The Journal of Business, University of Chicago Press, vol. 54(4), pages 579-596, October.
    35. Marco Avellaneda & Jeong-Hyun Lee, 2010. "Statistical arbitrage in the US equities market," Quantitative Finance, Taylor & Francis Journals, vol. 10(7), pages 761-782.
    36. Matt Taddy & Matt Gardner & Liyun Chen & David Draper, 2016. "A Nonparametric Bayesian Analysis of Heterogenous Treatment Effects in Digital Experimentation," Journal of Business & Economic Statistics, Taylor & Francis Journals, vol. 34(4), pages 661-672, October.
    37. Christopher Krauss & Anh Do & Nicolas Huck, 2017. "Deep neural networks, gradient-boosted trees, random forests: Statistical arbitrage on the S&P 500," Post-Print hal-01768895, HAL.
    38. Angela J. Black & Olga Klinkowska & David G. McMillan & Fiona J. McMillan, 2014. "Forecasting Stock Returns: Do Commodity Prices Help?," Journal of Forecasting, John Wiley & Sons, Ltd., vol. 33(8), pages 627-639, December.
    39. Deren Caliskan & Mohammad Najand, 2016. "Stock market returns and the price of gold," Journal of Asset Management, Palgrave Macmillan, vol. 17(1), pages 10-21, January.
    40. Hui Zou & Trevor Hastie, 2005. "Regularization and variable selection via the elastic net," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 67(2), pages 301-320, April.
    41. Jones, Charles M & Kaul, Gautam, 1996. " Oil and the Stock Markets," Journal of Finance, American Finance Association, vol. 51(2), pages 463-491, June.
    42. Ariel, Robert A, 1990. " High Stock Returns before Holidays: Existence and Evidence on Possible Causes," Journal of Finance, American Finance Association, vol. 45(5), pages 1611-1626, December.
    43. N. Baba & Y. Sakurai, 2011. "Predicting regime switches in the VIX index with macroeconomic variables," Applied Economics Letters, Taylor & Francis Journals, vol. 18(15), pages 1415-1419.
    44. Ferson, Wayne E & Schadt, Rudi W, 1996. " Measuring Fund Strategy and Performance in Changing Economic Conditions," Journal of Finance, American Finance Association, vol. 51(2), pages 425-461, June.
    45. Fischer, Thomas & Krauss, Christopher, 2018. "Deep learning with long short-term memory networks for financial market predictions," European Journal of Operational Research, Elsevier, vol. 270(2), pages 654-669.
    Full references (including those not matched with items on IDEAS)

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:ejores:v:278:y:2019:i:1:p:330-342. See general information about how to correct material in RePEc.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: (Dana Niculescu). General contact details of provider: http://www.elsevier.com/locate/eor .

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service hosted by the Research Division of the Federal Reserve Bank of St. Louis . RePEc uses bibliographic data supplied by the respective publishers.