IDEAS home Printed from https://ideas.repec.org/p/arx/papers/1806.01743.html
   My bibliography  Save this paper

A Machine Learning Framework for Stock Selection

Author

Listed:
  • XingYu Fu
  • JinHong Du
  • YiFeng Guo
  • MingWen Liu
  • Tao Dong
  • XiuWen Duan

Abstract

This paper demonstrates how to apply machine learning algorithms to distinguish good stocks from the bad stocks. To this end, we construct 244 technical and fundamental features to characterize each stock, and label stocks according to their ranking with respect to the return-to-volatility ratio. Algorithms ranging from traditional statistical learning methods to recently popular deep learning method, e.g. Logistic Regression (LR), Random Forest (RF), Deep Neural Network (DNN), and the Stacking, are trained to solve the classification task. Genetic Algorithm (GA) is also used to implement feature selection. The effectiveness of the stock selection strategy is validated in Chinese stock market in both statistical and practical aspects, showing that: 1) Stacking outperforms other models reaching an AUC score of 0.972; 2) Genetic Algorithm picks a subset of 114 features and the prediction performances of all models remain almost unchanged after the selection procedure, which suggests some features are indeed redundant; 3) LR and DNN are radical models; RF is risk-neutral model; Stacking is somewhere between DNN and RF. 4) The portfolios constructed by our models outperform market average in back tests.

Suggested Citation

  • XingYu Fu & JinHong Du & YiFeng Guo & MingWen Liu & Tao Dong & XiuWen Duan, 2018. "A Machine Learning Framework for Stock Selection," Papers 1806.01743, arXiv.org, revised Aug 2018.
  • Handle: RePEc:arx:papers:1806.01743
    as

    Download full text from publisher

    File URL: http://arxiv.org/pdf/1806.01743
    File Function: Latest version
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Pu Shen, 2002. "Market timing strategies that worked," Research Working Paper RWP 02-01, Federal Reserve Bank of Kansas City.
    2. Huerta, Ramon & Corbacho, Fernando & Elkan, Charles, 2013. "Nonlinear support vector machines can systematically identify stocks with high and low future returns," Algorithmic Finance, IOS Press, vol. 2(1), pages 45-58.
    3. Pai, Ping-Feng & Lin, Chih-Sheng, 2005. "A hybrid ARIMA and support vector machines model in stock price forecasting," Omega, Elsevier, vol. 33(6), pages 497-505, December.
    4. van der Hart, Jaap & Slagter, Erica & van Dijk, Dick, 2003. "Stock selection strategies in emerging markets," Journal of Empirical Finance, Elsevier, vol. 10(1-2), pages 105-132, February.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Cassidy K. Buhler & Hande Y. Benson, 2023. "Efficient Solution of Portfolio Optimization Problems via Dimension Reduction and Sparsification," Papers 2306.12639, arXiv.org.
    2. Ganggang Guo & Yulei Rao & Feida Zhu & Fang Xu, 2020. "Innovative deep matching algorithm for stock portfolio selection using deep stock profiles," PLOS ONE, Public Library of Science, vol. 15(11), pages 1-31, November.
    3. Zheng Hao & Haowei Zhang & Yipu Zhang, 2023. "Stock Portfolio Management by Using Fuzzy Ensemble Deep Reinforcement Learning Algorithm," JRFM, MDPI, vol. 16(3), pages 1-14, March.
    4. Guillaume Coqueret & Tony Guida, 2020. "Training trees on tails with applications to portfolio choice," Annals of Operations Research, Springer, vol. 288(1), pages 181-221, May.
    5. Juan Carlos Upegui Mejía, 2020. "Transparencia estatal y datos personales : el problema de la publicidad de la información personal en poder del Estado : estudio comparado México-Colombia," Books, Universidad Externado de Colombia, Facultad de Derecho, number 1177, October.
    6. Ahmet Murat Ozbayoglu & Mehmet Ugur Gudelek & Omer Berat Sezer, 2020. "Deep Learning for Financial Applications : A Survey," Papers 2002.05786, arXiv.org.
    7. Akshit Kurani & Pavan Doshi & Aarya Vakharia & Manan Shah, 2023. "A Comprehensive Comparative Study of Artificial Neural Network (ANN) and Support Vector Machines (SVM) on Stock Forecasting," Annals of Data Science, Springer, vol. 10(1), pages 183-208, February.
    8. Guillaume Coqueret & Tony Guida, 2020. "Training trees on tails with applications to portfolio choice," Post-Print hal-04144665, HAL.
    9. Nymisha Bandi & Theja Tulabandhula, 2020. "Off-Policy Optimization of Portfolio Allocation Policies under Constraints," Papers 2012.11715, arXiv.org.
    10. Lulin Xu & Zhongwu Li, 2021. "A New Appraisal Model of Second-Hand Housing Prices in China’s First-Tier Cities Based on Machine Learning Algorithms," Computational Economics, Springer;Society for Computational Economics, vol. 57(2), pages 617-637, February.
    11. Reza Bradrania & Davood Pirayesh Neghab & Mojtaba Shafizadeh, 2022. "State-dependent stock selection in index tracking: a machine learning approach," Financial Markets and Portfolio Management, Springer;Swiss Society for Financial Market Research, vol. 36(1), pages 1-28, March.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Eero Pätäri & Timo Leivo, 2017. "A Closer Look At Value Premium: Literature Review And Synthesis," Journal of Economic Surveys, Wiley Blackwell, vol. 31(1), pages 79-168, February.
    2. Kamaladdin Fataliyev & Aneesh Chivukula & Mukesh Prasad & Wei Liu, 2021. "Stock Market Analysis with Text Data: A Review," Papers 2106.12985, arXiv.org, revised Jul 2021.
    3. Dimitrios Kartsonakis Mademlis & Nikolaos Dritsakis, 2021. "Volatility Forecasting using Hybrid GARCH Neural Network Models: The Case of the Italian Stock Market," International Journal of Economics and Financial Issues, Econjournals, vol. 11(1), pages 49-60.
    4. Goriaev, Alexei & Zabotkin, Alexei, 2006. "Risks of investing in the Russian stock market: Lessons of the first decade," Emerging Markets Review, Elsevier, vol. 7(4), pages 380-397, December.
    5. Lischewski, Judith & Voronkova, Svitlana, 2012. "Size, value and liquidity. Do They Really Matter on an Emerging Stock Market?," Emerging Markets Review, Elsevier, vol. 13(1), pages 8-25.
    6. Kozak, Serhiy & Nagel, Stefan & Santosh, Shrihari, 2020. "Shrinking the cross-section," Journal of Financial Economics, Elsevier, vol. 135(2), pages 271-292.
    7. Nazarian, Rafik & Gandali Alikhani, Nadiya & Naderi, Esmaeil & Amiri, Ashkan, 2013. "Forecasting Stock Market Volatility: A Forecast Combination Approach," MPRA Paper 46786, University Library of Munich, Germany.
    8. Tej Bahadur Shahi & Ashish Shrestha & Arjun Neupane & William Guo, 2020. "Stock Price Forecasting with Deep Learning: A Comparative Study," Mathematics, MDPI, vol. 8(9), pages 1-15, August.
    9. Dat Thanh Tran & Martin Magris & Juho Kanniainen & Moncef Gabbouj & Alexandros Iosifidis, 2017. "Tensor Representation in High-Frequency Financial Data for Price Change Prediction," Papers 1709.01268, arXiv.org, revised Nov 2017.
    10. Huij, Joop & Post, Thierry, 2011. "On the performance of emerging market equity mutual funds," Emerging Markets Review, Elsevier, vol. 12(3), pages 238-249, September.
    11. Tamerlan Mashadihasanli, 2022. "Stock Market Price Forecasting Using the Arima Model: an Application to Istanbul, Turkiye," Journal of Economic Policy Researches, Istanbul University, Faculty of Economics, vol. 9(2), pages 439-454, July.
    12. Cheng, Ching-Hsue & Wei, Liang-Ying, 2014. "A novel time-series model based on empirical mode decomposition for forecasting TAIEX," Economic Modelling, Elsevier, vol. 36(C), pages 136-141.
    13. Wei-Chiang Hong & Yucheng Dong & Chien-Yuan Lai & Li-Yueh Chen & Shih-Yung Wei, 2011. "SVR with Hybrid Chaotic Immune Algorithm for Seasonal Load Demand Forecasting," Energies, MDPI, vol. 4(6), pages 1-18, June.
    14. Basak, Suryoday & Kar, Saibal & Saha, Snehanshu & Khaidem, Luckyson & Dey, Sudeepa Roy, 2019. "Predicting the direction of stock market prices using tree-based classifiers," The North American Journal of Economics and Finance, Elsevier, vol. 47(C), pages 552-567.
    15. Muneeb Ahmad & Yousaf Ali Khan & Chonghui Jiang & Syed Jawad Haider Kazmi & Syed Zaheer Abbas, 2023. "The impact of COVID‐19 on unemployment rate: An intelligent based unemployment rate prediction in selected countries of Europe," International Journal of Finance & Economics, John Wiley & Sons, Ltd., vol. 28(1), pages 528-543, January.
    16. de Groot, Wilma & Pang, Juan & Swinkels, Laurens, 2012. "The cross-section of stock returns in frontier emerging markets," Journal of Empirical Finance, Elsevier, vol. 19(5), pages 796-818.
    17. Huiwen Wang & Wenyang Huang & Shanshan Wang, 2021. "Forecasting open-high-low-close data contained in candlestick chart," Papers 2104.00581, arXiv.org.
    18. Cenk Ufuk Yıldıran & Abdurrahman Fettahoğlu, 2017. "Forecasting USDTRY rate by ARIMA method," Cogent Economics & Finance, Taylor & Francis Journals, vol. 5(1), pages 1335968-133, January.
    19. Shangkun Deng & Kazuki Yoshiyama & Takashi Mitsubuchi & Akito Sakurai, 2015. "Hybrid Method of Multiple Kernel Learning and Genetic Algorithm for Forecasting Short-Term Foreign Exchange Rates," Computational Economics, Springer;Society for Computational Economics, vol. 45(1), pages 49-89, January.
    20. van der Hart, Jaap & de Zwart, Gerben & van Dijk, Dick, 2005. "The success of stock selection strategies in emerging markets: Is it risk or behavioral bias?," Emerging Markets Review, Elsevier, vol. 6(3), pages 238-262, September.

    More about this item

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:arx:papers:1806.01743. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: arXiv administrators (email available below). General contact details of provider: http://arxiv.org/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.