IDEAS home Printed from https://ideas.repec.org/a/gam/jsusta/v18y2026i2p1009-d1843784.html

Interpretable Data-Driven Ozone Prediction Using Statistical Diagnostics, XGBoost, SHAP and Temporal Fusion Transformers

Author

Listed:
  • Bin Hu

    (School of Earth Sciences and Resources, China University of Geosciences, Beijing 100083, China)

  • Ling Zeng

    (Geomathematics Key Laboratory of Sichuan Province, Chengdu Technological University, Chengdu 610059, China)

  • Haiming Fan

    (Shanxi Province Key Laboratory of Metallogeny and Assessment of Strategic Mineral Resources, Taiyuan 030006, China)

Abstract

This study develops an interpretable, data-driven framework for forecasting daily MDA8 ozone levels in the Beijing–Tianjin–Hebei (BTH) region, integrating statistical diagnostics, XGBoost-based SHAP feature interpretation, and the Temporal Fusion Transformer (TFT). Using two years of pollutant and meteorological data from 56 monitoring stations, we identify a dual temporal structure: ozone, temperature, and pressure follow non-stationary annual cycles, while eight other variables show stationary, autocorrelated short-term fluctuations. SHAP analysis reveals that temperature, followed by relative humidity, NO 2 , particulate matter, and pressure, are key predictors, in line with photochemical mechanisms. A hierarchical ablation experiment shows that multivariate models outperform bivariate ones, and meteorological variables improve predictions more than primary pollutants. The inclusion of five pollutant variables worsens performance due to multicollinearity. The XGBoost-TFT hybrid model, which compresses covariates into a single index, achieves the best performance (median R 2 = 0.686), outperforming raw-input models. These results validate the framework’s interpretability and alignment with photochemical mechanisms.

Suggested Citation

  • Bin Hu & Ling Zeng & Haiming Fan, 2026. "Interpretable Data-Driven Ozone Prediction Using Statistical Diagnostics, XGBoost, SHAP and Temporal Fusion Transformers," Sustainability, MDPI, vol. 18(2), pages 1-25, January.
  • Handle: RePEc:gam:jsusta:v:18:y:2026:i:2:p:1009-:d:1843784
    as

    Download full text from publisher

    File URL: https://www.mdpi.com/2071-1050/18/2/1009/pdf
    Download Restriction: no

    File URL: https://www.mdpi.com/2071-1050/18/2/1009/
    Download Restriction: no
    ---><---

    More about this item

    Keywords

    ;
    ;
    ;
    ;
    ;
    ;

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jsusta:v:18:y:2026:i:2:p:1009-:d:1843784. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.