IDEAS home Printed from https://ideas.repec.org/p/arx/papers/2010.01265.html
   My bibliography  Save this paper

DoubleEnsemble: A New Ensemble Method Based on Sample Reweighting and Feature Selection for Financial Data Analysis

Author

Listed:
  • Chuheng Zhang
  • Yuanqi Li
  • Xi Chen
  • Yifei Jin
  • Pingzhong Tang
  • Jian Li

Abstract

Modern machine learning models (such as deep neural networks and boosting decision tree models) have become increasingly popular in financial market prediction, due to their superior capacity to extract complex non-linear patterns. However, since financial datasets have very low signal-to-noise ratio and are non-stationary, complex models are often very prone to overfitting and suffer from instability issues. Moreover, as various machine learning and data mining tools become more widely used in quantitative trading, many trading firms have been producing an increasing number of features (aka factors). Therefore, how to automatically select effective features becomes an imminent problem. To address these issues, we propose DoubleEnsemble, an ensemble framework leveraging learning trajectory based sample reweighting and shuffling based feature selection. Specifically, we identify the key samples based on the training dynamics on each sample and elicit key features based on the ablation impact of each feature via shuffling. Our model is applicable to a wide range of base models, capable of extracting complex patterns, while mitigating the overfitting and instability issues for financial market prediction. We conduct extensive experiments, including price prediction for cryptocurrencies and stock trading, using both DNN and gradient boosting decision tree as base models. Our experiment results demonstrate that DoubleEnsemble achieves a superior performance compared with several baseline methods.

Suggested Citation

  • Chuheng Zhang & Yuanqi Li & Xi Chen & Yifei Jin & Pingzhong Tang & Jian Li, 2020. "DoubleEnsemble: A New Ensemble Method Based on Sample Reweighting and Feature Selection for Financial Data Analysis," Papers 2010.01265, arXiv.org, revised Jan 2021.
  • Handle: RePEc:arx:papers:2010.01265
    as

    Download full text from publisher

    File URL: http://arxiv.org/pdf/2010.01265
    File Function: Latest version
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Stephen A. Ross, 2013. "The Arbitrage Theory of Capital Asset Pricing," World Scientific Book Chapters, in: Leonard C MacLean & William T Ziemba (ed.), HANDBOOK OF THE FUNDAMENTALS OF FINANCIAL DECISION MAKING Part I, chapter 1, pages 11-30, World Scientific Publishing Co. Pte. Ltd..
    2. Chan, Louis K C & Hamao, Yasushi & Lakonishok, Josef, 1991. "Fundamentals and Stock Returns in Japan," Journal of Finance, American Finance Association, vol. 46(5), pages 1739-1764, December.
    3. Rama Cont & Arseniy Kukanov & Sasha Stoikov, 2014. "The Price Impact of Order Book Events," Journal of Financial Econometrics, Oxford University Press, vol. 12(1), pages 47-88.
    4. Tianping Zhang & Yuanqi Li & Yifei Jin & Jian Li, 2020. "AutoAlpha: an Efficient Hierarchical Evolutionary Algorithm for Mining Alpha Factors in Quantitative Investment," Papers 2002.08245, arXiv.org, revised Apr 2020.
    5. Basu, Sanjoy, 1983. "The relationship between earnings' yield, market value and return for NYSE common stocks : Further evidence," Journal of Financial Economics, Elsevier, vol. 12(1), pages 129-156, June.
    6. Bhandari, Laxmi Chand, 1988. " Debt/Equity Ratio and Expected Common Stock Returns: Empirical Evidence," Journal of Finance, American Finance Association, vol. 43(2), pages 507-528, June.
    7. Banz, Rolf W., 1981. "The relationship between return and market value of common stocks," Journal of Financial Economics, Elsevier, vol. 9(1), pages 3-18, March.
    8. Fischer, Thomas & Krauss, Christopher, 2018. "Deep learning with long short-term memory networks for financial market predictions," European Journal of Operational Research, Elsevier, vol. 270(2), pages 654-669.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Lam, Keith S. K., 2002. "The relationship between size, book-to-market equity ratio, earnings-price ratio, and return for the Hong Kong stock market," Global Finance Journal, Elsevier, vol. 13(2), pages 163-179.
    2. ALAM Nafis & TAN Ee Chain, 2012. "Impact Of Financial Crisis On Stock Returns: Evidence From Singapore," Studies in Business and Economics, Lucian Blaga University of Sibiu, Faculty of Economic Sciences, vol. 7(2), pages 5-19, August.
    3. Jennifer Conrad & Michael Cooper & Gautam Kaul, 2003. "Value versus Glamour," Journal of Finance, American Finance Association, vol. 58(5), pages 1969-1996, October.
    4. Phan Tran Minh Hung & Tran Thi Trang Dai & Phan Nguyen Bao Quynh & Le Duc Toan & Vo Hoang Diem Trinh, 2019. "The Relationship between Risk and Return - An Empirical Evidence from Real Estate Stocks Listed in Vietnam," Asian Economic and Financial Review, Asian Economic and Social Society, vol. 9(11), pages 1211-1226, November.
    5. M. Eskandar Shah & Sourafel Girm & R. Hudson, 2012. "Rationalizing the Value Premium under Economic Fundamentals in an Emerging Market," Working Papers 12010, Bangor Business School, Prifysgol Bangor University (Cymru / Wales).
    6. Massimo Guidolin & Manuela Pedio, 2019. "How Smart is the Real Estate Smart Beta? Evidence from Optimal Style Factor Strategies for REITs," BAFFI CAREFIN Working Papers 19117, BAFFI CAREFIN, Centre for Applied Research on International Markets Banking Finance and Regulation, Universita' Bocconi, Milano, Italy.
    7. repec:dau:papers:123456789/2514 is not listed on IDEAS
    8. Geertsema, Paul & Lu, Helen, 2020. "The correlation structure of anomaly strategies," Journal of Banking & Finance, Elsevier, vol. 119(C).
    9. Shahzad, Syed Jawad Hussain & Zakaria, Muhammad & Raza, Naveed, 2014. "Sensitivity Analysis of CAPM Estimates: Data Frequency and Time Frame," MPRA Paper 60110, University Library of Munich, Germany.
    10. Stafylas, Dimitrios & Anderson, Keith & Uddin, Moshfique, 2017. "Recent advances in explaining hedge fund returns: Implicit factors and exposures," Global Finance Journal, Elsevier, vol. 33(C), pages 69-87.
    11. Karagiannidis, Iordanis & Vozlyublennaia, Nadia, 2016. "Limits to mutual funds' ability to rely on mean/variance optimization," Journal of Empirical Finance, Elsevier, vol. 37(C), pages 282-292.
    12. Merdad, Hesham Jamil & Kabir Hassan, M. & Hippler, William J., 2015. "The Islamic risk factor in expected stock returns: an empirical study in Saudi Arabia," Pacific-Basin Finance Journal, Elsevier, vol. 34(C), pages 293-314.
    13. Michael E. Drew & Madhu Veeraraghavan, 2000. "Multifactor Models are Alive and Well," School of Economics and Finance Discussion Papers and Working Papers Series 083, School of Economics and Finance, Queensland University of Technology.
    14. Michael E. Drew & Mirela Mallin & Tony Naughton & Madhu Veeraraghavan, 2004. "Equity Premium: - Does it exist? Evidence from Germany and United Kingdom," School of Economics and Finance Discussion Papers and Working Papers Series 170, School of Economics and Finance, Queensland University of Technology.
    15. Bernard Bollen & Philip Gharghori, 2016. "How is β related to asset returns?," Applied Economics, Taylor & Francis Journals, vol. 48(21), pages 1925-1935, May.
    16. Frederico Belo & Xiaoji Lin & Santiago Bazdresch, 2014. "Labor Hiring, Investment, and Stock Return Predictability in the Cross Section," Journal of Political Economy, University of Chicago Press, vol. 122(1), pages 129-177.
    17. Jagannathan, Ravi & Wang, Zhenyu, 1996. "The Conditional CAPM and the Cross-Section of Expected Returns," Journal of Finance, American Finance Association, vol. 51(1), pages 3-53, March.
    18. Erdos, Péter & Ormos, Mihály & Zibriczky, Dávid, 2011. "Non-parametric and semi-parametric asset pricing," Economic Modelling, Elsevier, vol. 28(3), pages 1150-1162, May.
    19. Amit Goyal, 2012. "Empirical cross-sectional asset pricing: a survey," Financial Markets and Portfolio Management, Springer;Swiss Society for Financial Market Research, vol. 26(1), pages 3-38, March.
    20. N. S. Nanayakkara & P. D. Nimal & Y. K. Weerakoon, 2019. "Behavioural Asset Pricing: A Review," International Journal of Economics and Financial Issues, Econjournals, vol. 9(4), pages 101-108.
    21. Rocciolo, Francesco & Gheno, Andrea & Brooks, Chris, 2022. "Explaining abnormal returns in stock markets: An alpha-neutral version of the CAPM," International Review of Financial Analysis, Elsevier, vol. 82(C).

    More about this item

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:arx:papers:2010.01265. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: arXiv administrators (email available below). General contact details of provider: http://arxiv.org/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.