IDEAS home Printed from https://ideas.repec.org/p/hal/journl/hal-04144665.html
   My bibliography  Save this paper

Training trees on tails with applications to portfolio choice

Author

Listed:
  • Guillaume Coqueret

    (EM - EMLyon Business School)

  • Tony Guida

    (RAM Alternative Investments)

Abstract

In this article, we investigate the impact of truncating training data when fitting regression trees. We argue that training times can be curtailed by reducing the training sample without any loss in out-ofsample accuracy as long as the prediction model has been trained on the tails of the dependent variable, that is, when 'average' observations have been discarded from the training sample. Filtering instances has an impact on the features that are selected to yield the splits and can help reduce overfitting by favoring predictors with monotonous impacts on the dependent variable. We test this technique in an out-of-sample exercise of portfolio selection which shows its benefits. The implications of our results are decisive for time-consuming tasks such as hyperparameter tuning and validation.

Suggested Citation

  • Guillaume Coqueret & Tony Guida, 2020. "Training trees on tails with applications to portfolio choice," Post-Print hal-04144665, HAL.
  • Handle: RePEc:hal:journl:hal-04144665
    DOI: 10.1007/s10479-020-03539-2
    Note: View the original document on HAL open archive server: https://hal.science/hal-04144665v1
    as

    Download full text from publisher

    File URL: https://hal.science/hal-04144665v1/document
    Download Restriction: no

    File URL: https://libkey.io/10.1007/s10479-020-03539-2?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Kelly, Bryan T. & Pruitt, Seth & Su, Yinan, 2019. "Characteristics are covariances: A unified model of risk and return," Journal of Financial Economics, Elsevier, vol. 134(3), pages 501-524.
    2. Fama, Eugene F & French, Kenneth R, 1992. "The Cross-Section of Expected Stock Returns," Journal of Finance, American Finance Association, vol. 47(2), pages 427-465, June.
    3. Krauss, Christopher & Do, Xuan Anh & Huck, Nicolas, 2017. "Deep neural networks, gradient-boosted trees, random forests: Statistical arbitrage on the S&P 500," European Journal of Operational Research, Elsevier, vol. 259(2), pages 689-702.
    4. Juhani T Linnainmaa & Michael R Roberts, 2018. "The History of the Cross-Section of Stock Returns," The Review of Financial Studies, Society for Financial Studies, vol. 31(7), pages 2606-2649.
    5. Ammann, Manuel & Coqueret, Guillaume & Schade, Jan-Philip, 2016. "Characteristics-based portfolio choice with leverage constraints," Journal of Banking & Finance, Elsevier, vol. 70(C), pages 23-37.
    6. Ralph S. J. Koijen & Motohiro Yogo, 2019. "A Demand System Approach to Asset Pricing," Journal of Political Economy, University of Chicago Press, vol. 127(4), pages 1475-1515.
    7. Michael W. Brandt & Pedro Santa-Clara & Rossen Valkanov, 2009. "Parametric Portfolio Policies: Exploiting Characteristics in the Cross-Section of Equity Returns," The Review of Financial Studies, Society for Financial Studies, vol. 22(9), pages 3411-3447, September.
    8. Gür Ali, Özden & Yaman, Kübra, 2013. "Selecting rows and columns for training support vector regression models with large retail datasets," European Journal of Operational Research, Elsevier, vol. 226(3), pages 471-480.
    9. Campbell R. Harvey, 2017. "Presidential Address: The Scientific Outlook in Financial Economics," Journal of Finance, American Finance Association, vol. 72(4), pages 1399-1440, August.
    10. Christopher Krauss & Anh Do & Nicolas Huck, 2017. "Deep neural networks, gradient-boosted trees, random forests: Statistical arbitrage on the S&P 500," Post-Print hal-01768895, HAL.
    11. Romano, Joseph P. & Wolf, Michael, 2013. "Testing for monotonicity in expected asset returns," Journal of Empirical Finance, Elsevier, vol. 23(C), pages 93-116.
    12. Asness, Clifford & Frazzini, Andrea & Israel, Ronen & Moskowitz, Tobias J. & Pedersen, Lasse H., 2018. "Size matters, if you control your junk," Journal of Financial Economics, Elsevier, vol. 129(3), pages 479-509.
    13. Manuel Ammann & Guillaume Coqueret & Jan-Philip Schade, 2016. "Characteristics-based portfolio choice with leverage constraints," Post-Print hal-02009129, HAL.
    14. Stambaugh, Robert F., 1999. "Predictive regressions," Journal of Financial Economics, Elsevier, vol. 54(3), pages 375-421, December.
    15. XingYu Fu & JinHong Du & YiFeng Guo & MingWen Liu & Tao Dong & XiuWen Duan, 2018. "A Machine Learning Framework for Stock Selection," Papers 1806.01743, arXiv.org, revised Aug 2018.
    16. Manuel Ammann & Guillaume Coqueret & Jan-Philip Schade, 2016. "Characteristics-based portfolio choice with leverage constraints," Post-Print hal-02312221, HAL.
    17. Olivier Lopez & Xavier Milhaud & Pierre-Emmanuel Thérond, 2016. "Tree-based censored regression with applications in insurance," Post-Print hal-01364437, HAL.
    18. Amit Goyal, 2012. "Empirical cross-sectional asset pricing: a survey," Financial Markets and Portfolio Management, Springer;Swiss Society for Financial Market Research, vol. 26(1), pages 3-38, March.
    19. Fama, Eugene F. & French, Kenneth R., 2015. "A five-factor asset pricing model," Journal of Financial Economics, Elsevier, vol. 116(1), pages 1-22.
    20. Barroso, Pedro & Santa-Clara, Pedro, 2015. "Momentum has its moments," Journal of Financial Economics, Elsevier, vol. 116(1), pages 111-120.
    21. van Dijk, Mathijs A., 2011. "Is size dead? A review of the size effect in equity returns," Journal of Banking & Finance, Elsevier, vol. 35(12), pages 3263-3274.
    22. Huck, Nicolas, 2019. "Large data sets and machine learning: Applications to statistical arbitrage," European Journal of Operational Research, Elsevier, vol. 278(1), pages 330-342.
    23. Murat Köksalan & Ceren Tuncer Şakar, 2016. "An interactive approach to stochastic programming-based portfolio optimization," Annals of Operations Research, Springer, vol. 245(1), pages 47-66, October.
    24. Eero Pätäri & Timo Leivo, 2017. "A Closer Look At Value Premium: Literature Review And Synthesis," Journal of Economic Surveys, Wiley Blackwell, vol. 31(1), pages 79-168, February.
    25. Nicolas Huck, 2019. "Large data sets and machine learning: Applications to statistical arbitrage," Post-Print hal-02143971, HAL.
    26. Joseph P. Romano & Michael Wolf, 2011. "Testing for monotonicity in expected asset returns," ECON - Working Papers 017, Department of Economics - University of Zurich, revised Jan 2013.
    27. Olivier Lopez & Xavier Milhaud & Pierre-Emmanuel Thérond, 2016. "Tree-based censored regression with applications in insurance," Post-Print hal-01141228, HAL.
    28. Novy-Marx, Robert, 2012. "Is momentum really momentum?," Journal of Financial Economics, Elsevier, vol. 103(3), pages 429-453.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Guillaume Coqueret & Tony Guida, 2020. "Training trees on tails with applications to portfolio choice," Annals of Operations Research, Springer, vol. 288(1), pages 181-221, May.
    2. Guillaume Chevalier & Guillaume Coqueret & Thomas Raffinot, 2022. "Supervised portfolios," Post-Print hal-04144588, HAL.
    3. Tony Guida & Guillaume Coqueret, 2019. "Ensemble Learning Applied to Quant Equity: Gradient Boosting in a Multifactor Framework," Post-Print hal-02311104, HAL.
    4. Pedro M. Mirete-Ferrer & Alberto Garcia-Garcia & Juan Samuel Baixauli-Soler & Maria A. Prats, 2022. "A Review on Machine Learning for Asset Management," Risks, MDPI, vol. 10(4), pages 1-46, April.
    5. Eric Andr'e & Guillaume Coqueret, 2020. "Dirichlet policies for reinforced factor portfolios," Papers 2011.05381, arXiv.org, revised Jun 2021.
    6. Guillaume Coqueret, 2022. "Characteristics-driven returns in equilibrium," Papers 2203.07865, arXiv.org.
    7. Guillaume Coqueret, 2023. "Forking paths in financial economics," Papers 2401.08606, arXiv.org.
    8. Stephen A. Gorman & Frank J. Fabozzi, 2021. "The ABC’s of the alternative risk premium: academic roots," Journal of Asset Management, Palgrave Macmillan, vol. 22(6), pages 405-436, October.
    9. Flori, Andrea & Regoli, Daniele, 2021. "Revealing Pairs-trading opportunities with long short-term memory networks," European Journal of Operational Research, Elsevier, vol. 295(2), pages 772-791.
    10. Rubesam, Alexandre, 2022. "Machine learning portfolios with equal risk contributions: Evidence from the Brazilian market," Emerging Markets Review, Elsevier, vol. 51(PB).
    11. Uddin, Ajim & Yu, Dantong, 2020. "Latent factor model for asset pricing," Journal of Behavioral and Experimental Finance, Elsevier, vol. 27(C).
    12. Jiang, Chonghui & Du, Jiangze & An, Yunbi & Zhang, Jinqing, 2021. "Factor tracking: A new smart beta strategy that outperforms naïve diversification," Economic Modelling, Elsevier, vol. 96(C), pages 396-408.
    13. Söhnke M. Bartram & Harald Lohre & Peter F. Pope & Ananthalakshmi Ranganathan, 2021. "Navigating the factor zoo around the world: an institutional investor perspective," Journal of Business Economics, Springer, vol. 91(5), pages 655-703, July.
    14. Han, Chulwoo & He, Zhaodong & Toh, Alenson Jun Wei, 2023. "Pairs trading via unsupervised learning," European Journal of Operational Research, Elsevier, vol. 307(2), pages 929-947.
    15. Baoqiang Zhan & Shu Zhang & Helen S. Du & Xiaoguang Yang, 2022. "Exploring Statistical Arbitrage Opportunities Using Machine Learning Strategy," Computational Economics, Springer;Society for Computational Economics, vol. 60(3), pages 861-882, October.
    16. Cederburg, Scott & O’Doherty, Michael S. & Wang, Feifei & Yan, Xuemin (Sterling), 2020. "On the performance of volatility-managed portfolios," Journal of Financial Economics, Elsevier, vol. 138(1), pages 95-117.
    17. Pätäri, Eero & Karell, Ville & Luukka, Pasi & Yeomans, Julian S, 2018. "Comparison of the multicriteria decision-making methods for equity portfolio selection: The U.S. evidence," European Journal of Operational Research, Elsevier, vol. 265(2), pages 655-672.
    18. Christian Fieberg & Daniel Metko & Thorsten Poddig & Thomas Loy, 2023. "Machine learning techniques for cross-sectional equity returns’ prediction," OR Spectrum: Quantitative Approaches in Management, Springer;Gesellschaft für Operations Research e.V., vol. 45(1), pages 289-323, March.
    19. Kentaro Imajo & Kentaro Minami & Katsuya Ito & Kei Nakagawa, 2020. "Deep Portfolio Optimization via Distributional Prediction of Residual Factors," Papers 2012.07245, arXiv.org.
    20. Clarke, Charles, 2022. "The level, slope, and curve factor model for stocks," Journal of Financial Economics, Elsevier, vol. 143(1), pages 159-187.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:hal:journl:hal-04144665. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: CCSD (email available below). General contact details of provider: https://hal.archives-ouvertes.fr/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.