IDEAS home Printed from https://ideas.repec.org/a/gam/jijfss/v13y2025i3p155-d1731685.html
   My bibliography  Save this article

A Fusion of Statistical and Machine Learning Methods: GARCH-XGBoost for Improved Volatility Modelling of the JSE Top40 Index

Author

Listed:
  • Israel Maingo

    (Department of Mathematical and Computational Sciences, University of Venda, Private Bag X5050, Thohoyandou 0950, Limpopo, South Africa)

  • Thakhani Ravele

    (Department of Mathematical and Computational Sciences, University of Venda, Private Bag X5050, Thohoyandou 0950, Limpopo, South Africa)

  • Caston Sigauke

    (Department of Mathematical and Computational Sciences, University of Venda, Private Bag X5050, Thohoyandou 0950, Limpopo, South Africa)

Abstract

Volatility modelling is a key feature of financial risk management, portfolio optimisation, and forecasting, particularly for market indices such as the JSE Top40 Index, which serves as a benchmark for the South African stock market. This study investigates volatility modelling of the JSE Top40 Index log-returns from 2011 to 2025 using a hybrid approach that integrates statistical and machine learning techniques through a two-step approach. The ARMA(3,2) model was chosen as the optimal mean model, using the auto.arima() function from the forecast package in R (version 4.4.0). Several alternative variants of GARCH models, including sGARCH(1,1), GJR-GARCH(1,1), and EGARCH(1,1), were fitted under various conditional error distributions (i.e., STD, SSTD, GED, SGED, and GHD). The choice of the model was based on AIC, BIC, HQIC, and LL evaluation criteria, and ARMA(3,2)-EGARCH(1,1) was the best model according to the lowest evaluation criteria. Residual diagnostic results indicated that the model adequately captured autocorrelation, conditional heteroskedasticity, and asymmetry in JSE Top40 log-returns. Volatility persistence was also detected, confirming the persistence attributes of financial volatility. Thereafter, the ARMA(3,2)-EGARCH(1,1) model was coupled with XGBoost using standardised residuals extracted from ARMA(3,2)-EGARCH(1,1) as lagged features. The data was split into training (60%), testing (20%), and calibration (20%) sets. Based on the lowest values of forecast accuracy measures (i.e., MASE, RMSE, MAE, MAPE, and sMAPE), along with prediction intervals and their evaluation metrics (i.e., PICP, PINAW, PICAW, and PINAD), the hybrid model captured residual nonlinearities left by the standalone ARMA(3,2)-EGARCH(1,1) and demonstrated improved forecasting accuracy. The hybrid ARMA(3,2)-EGARCH(1,1)-XGBoost model outperforms the standalone ARMA(3,2)-EGARCH(1,1) model across all forecast accuracy measures. This highlights the robustness and suitability of the hybrid ARMA(3,2)-EGARCH(1,1)-XGBoost model for financial risk management in emerging markets and signifies the strengths of integrating statistical and machine learning methods in financial time series modelling.

Suggested Citation

  • Israel Maingo & Thakhani Ravele & Caston Sigauke, 2025. "A Fusion of Statistical and Machine Learning Methods: GARCH-XGBoost for Improved Volatility Modelling of the JSE Top40 Index," IJFS, MDPI, vol. 13(3), pages 1-30, August.
  • Handle: RePEc:gam:jijfss:v:13:y:2025:i:3:p:155-:d:1731685
    as

    Download full text from publisher

    File URL: https://www.mdpi.com/2227-7072/13/3/155/pdf
    Download Restriction: no

    File URL: https://www.mdpi.com/2227-7072/13/3/155/
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Nelson, Daniel B, 1991. "Conditional Heteroskedasticity in Asset Returns: A New Approach," Econometrica, Econometric Society, vol. 59(2), pages 347-370, March.
    2. Bollerslev, Tim, 1986. "Generalized autoregressive conditional heteroskedasticity," Journal of Econometrics, Elsevier, vol. 31(3), pages 307-327, April.
    3. Glosten, Lawrence R & Jagannathan, Ravi & Runkle, David E, 1993. "On the Relation between the Expected Value and the Volatility of the Nominal Excess Return on Stocks," Journal of Finance, American Finance Association, vol. 48(5), pages 1779-1801, December.
    4. Drew Creal & Siem Jan Koopman & André Lucas, 2013. "Generalized Autoregressive Score Models With Applications," Journal of Applied Econometrics, John Wiley & Sons, Ltd., vol. 28(5), pages 777-795, August.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Gerlach, Richard & Wang, Chao, 2020. "Semi-parametric dynamic asymmetric Laplace models for tail risk forecasting, incorporating realized measures," International Journal of Forecasting, Elsevier, vol. 36(2), pages 489-506.
    2. Catania, Leopoldo & Proietti, Tommaso, 2020. "Forecasting volatility with time-varying leverage and volatility of volatility effects," International Journal of Forecasting, Elsevier, vol. 36(4), pages 1301-1317.
    3. Francisco Blasques & Paolo Gorgi & Siem Jan Koopman & Olivier Wintenberger, 2016. "Feasible Invertibility Conditions and Maximum Likelihood Estimation for Observation-Driven Models," Tinbergen Institute Discussion Papers 16-082/III, Tinbergen Institute.
    4. Giuseppe Storti & Chao Wang, 2023. "Modeling uncertainty in financial tail risk: A forecast combination and weighted quantile approach," Journal of Forecasting, John Wiley & Sons, Ltd., vol. 42(7), pages 1648-1663, November.
    5. Ardia, David & Bluteau, Keven & Boudt, Kris & Catania, Leopoldo, 2018. "Forecasting risk with Markov-switching GARCH models:A large-scale performance study," International Journal of Forecasting, Elsevier, vol. 34(4), pages 733-747.
    6. Storti, Giuseppe & Wang, Chao, 2022. "Nonparametric expected shortfall forecasting incorporating weighted quantiles," International Journal of Forecasting, Elsevier, vol. 38(1), pages 224-239.
    7. F Blasques & P Gorgi & S Koopman & O Wintenberger, 2016. "Feasible Invertibility Conditions for Maximum Likelihood Estimation for Observation-Driven Models," Papers 1610.02863, arXiv.org.
    8. Charles, Amélie & Darné, Olivier, 2017. "Forecasting crude-oil market volatility: Further evidence with jumps," Energy Economics, Elsevier, vol. 67(C), pages 508-519.
    9. Laporta, Alessandro G. & Merlo, Luca & Petrella, Lea, 2018. "Selection of Value at Risk Models for Energy Commodities," Energy Economics, Elsevier, vol. 74(C), pages 628-643.
    10. Wilson Ye Chen & Richard H. Gerlach, 2017. "Semiparametric GARCH via Bayesian model averaging," Papers 1708.07587, arXiv.org.
    11. Sebastian Bayer & Timo Dimitriadis, 2022. "Regression-Based Expected Shortfall Backtesting [Backtesting Expected Shortfall]," Journal of Financial Econometrics, Oxford University Press, vol. 20(3), pages 437-471.
    12. Blasques, Francisco & Ji, Jiangyu & Lucas, André, 2016. "Semiparametric score driven volatility models," Computational Statistics & Data Analysis, Elsevier, vol. 100(C), pages 58-69.
    13. Peter Reinhard Hansen & Chen Tong, 2022. "Option Pricing with Time-Varying Volatility Risk Aversion," Papers 2204.06943, arXiv.org, revised Mar 2025.
    14. F Blasques & P Gorgi & S J Koopman & O Wintenberger, 2016. "Feasible Invertibility Conditions for Maximum Likelihood Estimation for Observation-Driven Models ," Working Papers hal-01377971, HAL.
    15. Petropoulos, Fotios & Apiletti, Daniele & Assimakopoulos, Vassilios & Babai, Mohamed Zied & Barrow, Devon K. & Ben Taieb, Souhaib & Bergmeir, Christoph & Bessa, Ricardo J. & Bijak, Jakub & Boylan, Joh, 2022. "Forecasting: theory and practice," International Journal of Forecasting, Elsevier, vol. 38(3), pages 705-871.
      • Fotios Petropoulos & Daniele Apiletti & Vassilios Assimakopoulos & Mohamed Zied Babai & Devon K. Barrow & Souhaib Ben Taieb & Christoph Bergmeir & Ricardo J. Bessa & Jakub Bijak & John E. Boylan & Jet, 2020. "Forecasting: theory and practice," Papers 2012.03854, arXiv.org, revised Jan 2022.
    16. Roy Cerqueti & Massimiliano Giacalone & Raffaele Mattera, 2020. "Skewed non-Gaussian GARCH models for cryptocurrencies volatility modelling," Papers 2004.11674, arXiv.org.
    17. Trucíos, Carlos, 2019. "Forecasting Bitcoin risk measures: A robust approach," International Journal of Forecasting, Elsevier, vol. 35(3), pages 836-847.
    18. Giuseppe Storti & Chao Wang, 2021. "Modelling uncertainty in financial tail risk: a forecast combination and weighted quantile approach," Papers 2104.04918, arXiv.org, revised Jul 2021.
    19. Owusu Junior, Peterson & Tiwari, Aviral Kumar & Tweneboah, George & Asafo-Adjei, Emmanuel, 2022. "GAS and GARCH based value-at-risk modeling of precious metals," Resources Policy, Elsevier, vol. 75(C).
    20. Djennad, Abdelmajid & Rigby, Robert & Stasinopoulos, Dimitrios & Voudouris, Vlasios & Eilers, Paul, 2015. "Beyond location and dispersion models: The Generalized Structural Time Series Model with Applications," MPRA Paper 62807, University Library of Munich, Germany.

    More about this item

    Keywords

    ;
    ;
    ;
    ;
    ;
    ;
    ;
    ;
    ;
    ;
    ;
    ;

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jijfss:v:13:y:2025:i:3:p:155-:d:1731685. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.