IDEAS home Printed from https://ideas.repec.org/p/arx/papers/2411.16666.html
   My bibliography  Save this paper

CatNet: Controlling the False Discovery Rate in LSTM with SHAP Feature Importance and Gaussian Mirrors

Author

Listed:
  • Jiaan Han
  • Junxiao Chen
  • Yanzhe Fu

Abstract

We introduce CatNet, an algorithm that effectively controls False Discovery Rate (FDR) and selects significant features in LSTM. CatNet employs the derivative of SHAP values to quantify the feature importance, and constructs a vector-formed mirror statistic for FDR control with the Gaussian Mirror algorithm. To avoid instability due to nonlinear or temporal correlations among features, we also propose a new kernel-based independence measure. CatNet performs robustly on different model settings with both simulated and real-world data, which reduces overfitting and improves interpretability of the model. Our framework that introduces SHAP for feature importance in FDR control algorithms and improves Gaussian Mirror can be naturally extended to other time-series or sequential deep learning models.

Suggested Citation

  • Jiaan Han & Junxiao Chen & Yanzhe Fu, 2024. "CatNet: Controlling the False Discovery Rate in LSTM with SHAP Feature Importance and Gaussian Mirrors," Papers 2411.16666, arXiv.org, revised Jun 2025.
  • Handle: RePEc:arx:papers:2411.16666
    as

    Download full text from publisher

    File URL: http://arxiv.org/pdf/2411.16666
    File Function: Latest version
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Stephen Bates & Emmanuel Candès & Lucas Janson & Wenshuo Wang, 2021. "Metropolized Knockoff Sampling," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 116(535), pages 1413-1427, July.
    2. Jasin Machkour & Daniel P. Palomar & Michael Muma, 2024. "FDR-Controlled Portfolio Optimization for Sparse Financial Index Tracking," Papers 2401.15139, arXiv.org, revised Jan 2024.
    3. Torsten Hothorn & Thomas Kneib & Peter Bühlmann, 2014. "Conditional transformation models," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 76(1), pages 3-27, January.
    4. Rajchert, Andrew & Keich, Uri, 2023. "Controlling the false discovery rate via competition: Is the +1 needed?," Statistics & Probability Letters, Elsevier, vol. 197(C).
    5. Chenguang Dai & Buyu Lin & Xin Xing & Jun S. Liu, 2023. "A Scale-Free Approach for False Discovery Rate Control in Generalized Linear Models," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 118(543), pages 1551-1565, July.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Panxu Yuan & Yinfei Kong & Gaorong Li, 2024. "FDR control and power analysis for high-dimensional logistic regression via StabKoff," Statistical Papers, Springer, vol. 65(5), pages 2719-2749, July.
    2. Liang, Weijuan & Zhang, Qingzhao & Ma, Shuangge, 2024. "Hierarchical false discovery rate control for high-dimensional survival analysis with interactions," Computational Statistics & Data Analysis, Elsevier, vol. 192(C).
    3. Alexander Silbersdorff & Kai Sebastian Schneider, 2019. "Distributional Regression Techniques in Socioeconomic Research on the Inequality of Health with an Application on the Relationship between Mental Health and Income," IJERPH, MDPI, vol. 16(20), pages 1-28, October.
    4. Julien Hambuckers & Marie Kratz & Antoine Usseglio-Carleve, 2023. "Efficient Estimation In Extreme Value Regression Models Of Hedge Fund Tail Risks," Working Papers hal-04090916, HAL.
    5. Yuanhua Feng & Wolfgang Karl Härdle, 2021. "Uni- and multivariate extensions of the sinh-arcsinh normal distribution applied to distributional regression," Working Papers CIE 142, Paderborn University, CIE Center for International Economics.
    6. Miguel A Delgado & Andrés García-Suaza & Pedro H C Sant’Anna, 2022. "Distribution regression in duration analysis: an application to unemployment spells [Lecture notes in statistics: Proceedings]," The Econometrics Journal, Royal Economic Society, vol. 25(3), pages 675-698.
    7. Kristin Blesch & David S. Watson & Marvin N. Wright, 2024. "Conditional feature importance for mixed data," AStA Advances in Statistical Analysis, Springer;German Statistical Society, vol. 108(2), pages 259-278, June.
    8. Kneib, Thomas & Silbersdorff, Alexander & Säfken, Benjamin, 2023. "Rage Against the Mean – A Review of Distributional Regression Approaches," Econometrics and Statistics, Elsevier, vol. 26(C), pages 99-123.
    9. Alexander Silbersdorff & Julia Lynch & Stephan Klasen & Thomas Kneib, 2017. "Reconsidering the Income-Illness Relationship using Distributional Regression: An Application to Germany," Courant Research Centre: Poverty, Equity and Growth - Discussion Papers 231, Courant Research Centre PEG.
    10. Oliver Dürr & Stefan Hörtling & Danil Dold & Ivonne Kovylov & Beate Sick, 2024. "Bernstein flows for flexible posteriors in variational Bayes," AStA Advances in Statistical Analysis, Springer;German Statistical Society, vol. 108(2), pages 375-394, June.
    11. Alexander Sohn, 2015. "Beyond Conventional Wage Discrimination Analysis: Assessing Comprehensive Wage Distributions of Males and Females Using Structured Additive Distributional Regression," SOEPpapers on Multidisciplinary Panel Data Research 802, DIW Berlin, The German Socio-Economic Panel (SOEP).
    12. Srinivasan, Arun & Xue, Lingzhou & Zhan, Xiang, 2023. "Identification of microbial features in multivariate regression under false discovery rate control," Computational Statistics & Data Analysis, Elsevier, vol. 181(C).
    13. Alexander Henzi & Johanna F. Ziegel & Tilmann Gneiting, 2021. "Isotonic distributional regression," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 83(5), pages 963-993, November.
    14. Souhaib Ben Taieb & James W. Taylor & Rob J. Hyndman, 2017. "Coherent Probabilistic Forecasts for Hierarchical Time Series," Monash Econometrics and Business Statistics Working Papers 3/17, Monash University, Department of Econometrics and Business Statistics.
    15. T. Tony Cai & Zijian Guo & Yin Xia, 2023. "Statistical inference and large-scale multiple testing for high-dimensional regression models," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 32(4), pages 1135-1171, December.
    16. Alina Schenk & Moritz Berger & Matthias Schmid, 2024. "Pseudo-value regression trees," Lifetime Data Analysis: An International Journal Devoted to Statistical Methods and Applications for Time-to-Event Data, Springer, vol. 30(2), pages 439-471, April.
    17. Cheng Peng & Stanislav Uryasev, 2023. "Factor Model of Mixtures," Papers 2301.13843, arXiv.org, revised Mar 2023.
    18. Silius M. Vandeskog & Thordis L. Thorarinsdottir & Ingelin Steinsland & Finn Lindgren, 2022. "Quantile based modeling of diurnal temperature range with the five‐parameter lambda distribution," Environmetrics, John Wiley & Sons, Ltd., vol. 33(4), June.
    19. Wiemann, Paul F.V. & Klein, Nadja & Kneib, Thomas, 2022. "Correcting for sample selection bias in Bayesian distributional regression models," Computational Statistics & Data Analysis, Elsevier, vol. 168(C).
    20. Alejandro Román Vásquez & José Ulises Márquez Urbina & Graciela González Farías & Gabriel Escarela, 2024. "Controlling the false discovery rate by a Latent Gaussian Copula Knockoff procedure," Computational Statistics, Springer, vol. 39(3), pages 1435-1458, May.

    More about this item

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:arx:papers:2411.16666. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: arXiv administrators (email available below). General contact details of provider: http://arxiv.org/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.