IDEAS home Printed from https://ideas.repec.org/a/plo/pone00/0286362.html
   My bibliography  Save this article

A performance comparison of machine learning models for stock market prediction with novel investment strategy

Author

Listed:
  • Azaz Hassan Khan
  • Abdullah Shah
  • Abbas Ali
  • Rabia Shahid
  • Zaka Ullah Zahid
  • Malik Umar Sharif
  • Tariqullah Jan
  • Mohammad Haseeb Zafar

Abstract

Stock market forecasting is one of the most challenging problems in today’s financial markets. According to the efficient market hypothesis, it is almost impossible to predict the stock market with 100% accuracy. However, Machine Learning (ML) methods can improve stock market predictions to some extent. In this paper, a novel strategy is proposed to improve the prediction efficiency of ML models for financial markets. Nine ML models are used to predict the direction of the stock market. First, these models are trained and validated using the traditional methodology on a historic data captured over a 1-day time frame. Then, the models are trained using the proposed methodology. Following the traditional methodology, Logistic Regression achieved the highest accuracy of 85.51% followed by XG Boost and Random Forest. With the proposed strategy, the Random Forest model achieved the highest accuracy of 91.27% followed by XG Boost, ADA Boost and ANN. In the later part of the paper, it is shown that only classification report is not sufficient to validate the performance of ML model for stock market prediction. A simulation model of the financial market is used in order to evaluate the risk, maximum draw down and returns associate with each ML model. The overall results demonstrated that the proposed strategy not only improves the stock market returns but also reduces the risks associated with each ML model.

Suggested Citation

  • Azaz Hassan Khan & Abdullah Shah & Abbas Ali & Rabia Shahid & Zaka Ullah Zahid & Malik Umar Sharif & Tariqullah Jan & Mohammad Haseeb Zafar, 2023. "A performance comparison of machine learning models for stock market prediction with novel investment strategy," PLOS ONE, Public Library of Science, vol. 18(9), pages 1-19, September.
  • Handle: RePEc:plo:pone00:0286362
    DOI: 10.1371/journal.pone.0286362
    as

    Download full text from publisher

    File URL: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0286362
    Download Restriction: no

    File URL: https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0286362&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pone.0286362?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Michael Karpe, 2020. "An overall view of key problems in algorithmic trading and recent progress," Papers 2006.05515, arXiv.org.
    2. Lohrmann, Christoph & Luukka, Pasi, 2019. "Classification of intraday S&P500 returns with a Random Forest," International Journal of Forecasting, Elsevier, vol. 35(1), pages 390-407.
    3. Clements, Michael P. & Franses, Philip Hans & Swanson, Norman R., 2004. "Forecasting economic and financial time-series with non-linear models," International Journal of Forecasting, Elsevier, vol. 20(2), pages 169-183.
    4. Ghysels,Eric & Osborn,Denise R., 2001. "The Econometric Analysis of Seasonal Time Series," Cambridge Books, Cambridge University Press, number 9780521565882, Enero.
    5. Amir Shachmurove & Yochanan Shachmurove, 2004. "Annualized and Cumulative Returns on Venture-Backed Public Companies Categorized by Industry," Journal of Entrepreneurial Finance, Pepperdine University, Graziadio School of Business and Management, vol. 9(3), pages 41-60, Fall.
    6. Adamantios Ntakaris & Giorgio Mirone & Juho Kanniainen & Moncef Gabbouj & Alexandros Iosifidis, 2019. "Feature Engineering for Mid-Price Prediction with Deep Learning," Papers 1904.05384, arXiv.org, revised Jun 2019.
    7. Dev Shah & Haruna Isah & Farhana Zulkernine, 2019. "Stock Market Analysis: A Review and Taxonomy of Prediction Techniques," IJFS, MDPI, vol. 7(2), pages 1-22, May.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Perry Sadorsky, 2021. "A Random Forests Approach to Predicting Clean Energy Stock Prices," JRFM, MDPI, vol. 14(2), pages 1-20, January.
    2. Rafal Weron, 2006. "Modeling and Forecasting Electricity Loads and Prices: A Statistical Approach," HSC Books, Hugo Steinhaus Center, Wroclaw University of Science and Technology, number hsbook0601, December.
    3. Christoffersen, Peter & Ghysels, Eric & Swanson, Norman R., 2002. "Let's get "real" about using economic data," Journal of Empirical Finance, Elsevier, vol. 9(3), pages 343-360, August.
    4. Xilong Chen & Eric Ghysels, 2011. "News--Good or Bad--and Its Impact on Volatility Predictions over Multiple Horizons," The Review of Financial Studies, Society for Financial Studies, vol. 24(1), pages 46-81, October.
    5. Franses, Philip Hans, 2013. "Data revisions and periodic properties of macroeconomic data," Economics Letters, Elsevier, vol. 120(2), pages 139-141.
    6. Chambers, Marcus J. & Ercolani, Joanne S. & Taylor, A.M. Robert, 2014. "Testing for seasonal unit roots by frequency domain regression," Journal of Econometrics, Elsevier, vol. 178(P2), pages 243-258.
    7. Henriques, Irene & Sadorsky, Perry, 2023. "Forecasting rare earth stock prices with machine learning," Resources Policy, Elsevier, vol. 86(PA).
    8. Kamaladdin Fataliyev & Aneesh Chivukula & Mukesh Prasad & Wei Liu, 2021. "Stock Market Analysis with Text Data: A Review," Papers 2106.12985, arXiv.org, revised Jul 2021.
    9. Roberto Cellini & Tiziana Cuccia, 2013. "Museum and monument attendance and tourism flow: a time series analysis approach," Applied Economics, Taylor & Francis Journals, vol. 45(24), pages 3473-3482, August.
    10. Kaijian He & Rui Zha & Jun Wu & Kin Keung Lai, 2016. "Multivariate EMD-Based Modeling and Forecasting of Crude Oil Price," Sustainability, MDPI, vol. 8(4), pages 1-11, April.
    11. Jacek Kotlowski, 2005. "Money and prices in the Polish economy. Seasonal cointegration approach," Working Papers 20, Department of Applied Econometrics, Warsaw School of Economics.
    12. del Barrio Castro, Tomás & Hecq, Alain, 2016. "Testing for deterministic seasonality in mixed-frequency VARs," Economics Letters, Elsevier, vol. 149(C), pages 20-24.
    13. Terasvirta, Timo, 2006. "Forecasting economic variables with nonlinear models," Handbook of Economic Forecasting, in: G. Elliott & C. Granger & A. Timmermann (ed.), Handbook of Economic Forecasting, edition 1, volume 1, chapter 8, pages 413-457, Elsevier.
    14. Curry, Bruce, 2007. "Neural networks and seasonality: Some technical considerations," European Journal of Operational Research, Elsevier, vol. 179(1), pages 267-274, May.
    15. Altansukh, Gantungalag & Becker, Ralf & Bratsiotis, George J. & Osborn, Denise R., 2017. "What is the Globalisation of Inflation?," EconStor Open Access Articles and Book Chapters, ZBW - Leibniz Information Centre for Economics, vol. 74, pages 1-27.
    16. Oguzhan Cepni & I. Ethem Guney & Norman R. Swanson, 2020. "Forecasting and nowcasting emerging market GDP growth rates: The role of latent global economic policy uncertainty and macroeconomic data surprise factors," Journal of Forecasting, John Wiley & Sons, Ltd., vol. 39(1), pages 18-36, January.
    17. Saqib Farid & Rubeena Tashfeen & Tahseen Mohsan & Arsal Burhan, 2023. "Forecasting stock prices using a data mining method: Evidence from emerging market," International Journal of Finance & Economics, John Wiley & Sons, Ltd., vol. 28(2), pages 1911-1917, April.
    18. Constantin ANGHELACHE & Alexandru MANOLE & Mădălina Gabriela ANGHEL, 2016. "The major economic evolution of Romania by the middle of 2016," Theoretical and Applied Economics, Asociatia Generala a Economistilor din Romania / Editura Economica, vol. 0(4(609), W), pages 165-182, Winter.
    19. Rossen Anja, 2016. "On the Predictive Content of Nonlinear Transformations of Lagged Autoregression Residuals and Time Series Observations," Journal of Economics and Statistics (Jahrbuecher fuer Nationaloekonomie und Statistik), De Gruyter, vol. 236(3), pages 389-409, May.
    20. Borup, Daniel & Christensen, Bent Jesper & Mühlbach, Nicolaj Søndergaard & Nielsen, Mikkel Slot, 2023. "Targeting predictors in random forest regression," International Journal of Forecasting, Elsevier, vol. 39(2), pages 841-868.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pone00:0286362. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: plosone (email available below). General contact details of provider: https://journals.plos.org/plosone/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.