IDEAS home Printed from https://ideas.repec.org/a/spr/compst/v40y2025i2d10.1007_s00180-024-01510-4.html
   My bibliography  Save this article

Double truncation method for controlling local false discovery rate in case of spiky null

Author

Listed:
  • Shinjune Kim

    (Inha University Hospital)

  • Youngjae Oh

    (Korea National Statistics Office)

  • Johan Lim

    (Seoul National University)

  • DoHwan Park

    (University of Maryland)

  • Erin M. Green

    (University of Maryland)

  • Mark L. Ramos

    (National Institute of Health)

  • Jaesik Jeong

    (Chonnam National University
    Chonnam National University)

Abstract

Many multiple test procedures, which control the false discovery rate, have been developed to identify some cases (e.g. genes) showing statistically significant difference between two different groups. However, a common issue encountered in some practical data sets is the presence of highly spiky null distributions. Existing methods struggle to control type I error in such cases due to the “inflated false positives," but this problem has not been addressed in previous literature. Our team recently encountered this issue while analyzing SET4 gene deletion data and proposed modeling the null distribution using a scale mixture normal distribution. However, the use of this approach is limited due to strong assumptions on the spiky peak. In this paper, we present a novel multiple test procedure that can be applied to any type of spiky peak data, including situations with no spiky peak or with one or two spiky peaks. Our approach involves truncating the central statistics around 0, which primarily contribute to the null spike, as well as the two tails that may be contaminated by alternative distributions. We refer to this method as the “double truncation method." After applying double truncation, we estimate the null density using the doubly truncated maximum likelihood estimator. We demonstrate numerically that our proposed method effectively controls the false discovery rate at the desired level using simulated data. Furthermore, we apply our method to two real data sets, namely the SET protein data and peony data.

Suggested Citation

  • Shinjune Kim & Youngjae Oh & Johan Lim & DoHwan Park & Erin M. Green & Mark L. Ramos & Jaesik Jeong, 2025. "Double truncation method for controlling local false discovery rate in case of spiky null," Computational Statistics, Springer, vol. 40(2), pages 745-766, February.
  • Handle: RePEc:spr:compst:v:40:y:2025:i:2:d:10.1007_s00180-024-01510-4
    DOI: 10.1007/s00180-024-01510-4
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s00180-024-01510-4
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s00180-024-01510-4?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Iris Ivy M. Gauran & Junyong Park & Johan Lim & DoHwan Park & John Zylstra & Thomas Peterson & Maricel Kann & John L. Spouge, 2018. "Empirical null estimation using zero†inflated discrete mixture distributions and its application to protein domain data," Biometrics, The International Biometric Society, vol. 74(2), pages 458-471, June.
    2. Liang, Kun & Nettleton, Dan, 2010. "A Hidden Markov Model Approach to Testing Multiple Hypotheses on a Tree-Transformed Gene Ontology Graph," Journal of the American Statistical Association, American Statistical Association, vol. 105(492), pages 1444-1454.
    3. Lee, Donghwan & Lee, Youngjo, 2016. "Extended likelihood approach to multiple testing with directional error control under a hidden Markov random field model," Journal of Multivariate Analysis, Elsevier, vol. 151(C), pages 1-13.
    4. Park, DoHwan & Park, Junyong & Zhong, Xiaosong & Sadelain, Michel, 2011. "Estimation of empirical null using a mixture of normals and its use in local false discovery rate," Computational Statistics & Data Analysis, Elsevier, vol. 55(7), pages 2421-2432, July.
    5. Efron, Bradley, 2004. "Large-Scale Simultaneous Hypothesis Testing: The Choice of a Null Hypothesis," Journal of the American Statistical Association, American Statistical Association, vol. 99, pages 96-104, January.
    6. Joungyoun Kim & Donghyeon Yu & Johan Lim & Joong-Ho Won, 2018. "A peeling algorithm for multiple testing on a random field," Computational Statistics, Springer, vol. 33(1), pages 503-525, March.
    7. Wenguang Sun & T. Tony Cai, 2009. "Large‐scale multiple testing under dependence," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 71(2), pages 393-424, April.
    8. James G. Scott & Ryan C. Kelly & Matthew A. Smith & Pengcheng Zhou & Robert E. Kass, 2015. "False Discovery Rate Regression: An Application to Neural Synchrony Detection in Primary Visual Cortex," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 110(510), pages 459-471, June.
    9. Nabarun Deb & Sujayam Saha & Adityanand Guntuboyina & Bodhisattva Sen, 2022. "Two-Component Mixture Model in the Presence of Covariates," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 117(540), pages 1820-1834, October.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Yi-Hui Zhou & Paul Brooks & Xiaoshan Wang, 2018. "A Two-Stage Hidden Markov Model Design for Biomarker Detection, with Application to Microbiome Research," Statistics in Biosciences, Springer;International Chinese Statistical Association, vol. 10(1), pages 41-58, April.
    2. Lee, Donghwan & Lee, Youngjo, 2016. "Extended likelihood approach to multiple testing with directional error control under a hidden Markov random field model," Journal of Multivariate Analysis, Elsevier, vol. 151(C), pages 1-13.
    3. Pei Fen Kuan & Derek Y. Chiang, 2012. "Integrating Prior Knowledge in Multiple Testing under Dependence with Applications to Detecting Differential DNA Methylation," Biometrics, The International Biometric Society, vol. 68(3), pages 774-783, September.
    4. Wesley Tansey & Yixin Wang & Raul Rabadan & David Blei, 2020. "Double Empirical Bayes Testing," International Statistical Review, International Statistical Institute, vol. 88(S1), pages 91-113, December.
    5. Hai Shu & Bin Nan & Robert Koeppe, 2015. "Multiple testing for neuroimaging via hidden Markov random field," Biometrics, The International Biometric Society, vol. 71(3), pages 741-750, September.
    6. Sairam Rayaprolu & Zhiyi Chi, 2021. "False Discovery Variance Reduction in Large Scale Simultaneous Hypothesis Tests," Methodology and Computing in Applied Probability, Springer, vol. 23(3), pages 711-733, September.
    7. Nikolaos Ignatiadis & Wolfgang Huber, 2021. "Covariate powered cross‐weighted multiple testing," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 83(4), pages 720-751, September.
    8. T. Tony Cai & Wenguang Sun & Weinan Wang, 2019. "Covariate‐assisted ranking and screening for large‐scale two‐sample inference," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 81(2), pages 187-234, April.
    9. Chang Yu & Daniel Zelterman, 2020. "Distributions associated with simultaneous multiple hypothesis testing," Journal of Statistical Distributions and Applications, Springer, vol. 7(1), pages 1-17, December.
    10. Marie Perrot-Dockès & Gilles Blanchard & Pierre Neuvial & Etienne Roquain, 2023. "Selective inference for false discovery proportion in a hidden Markov model," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 32(4), pages 1365-1391, December.
    11. Iris Ivy M. Gauran & Junyong Park & Johan Lim & DoHwan Park & John Zylstra & Thomas Peterson & Maricel Kann & John L. Spouge, 2018. "Empirical null estimation using zero†inflated discrete mixture distributions and its application to protein domain data," Biometrics, The International Biometric Society, vol. 74(2), pages 458-471, June.
    12. T. Tony Cai & Weidong Liu, 2016. "Large-Scale Multiple Testing of Correlations," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 111(513), pages 229-240, March.
    13. Ryan Martin, 2021. "A Survey of Nonparametric Mixing Density Estimation via the Predictive Recursion Algorithm," Sankhya B: The Indian Journal of Statistics, Springer;Indian Statistical Institute, vol. 83(1), pages 97-121, May.
    14. Pallavi Basu & Luella Fu & Alessio Saretto & Wenguang Sun, 2021. "Empirical Bayes Control of the False Discovery Exceedance," Working Papers 2115, Federal Reserve Bank of Dallas.
    15. Tingting Cui & Pengfei Wang & Wensheng Zhu, 2021. "Covariate-adjusted multiple testing in genome-wide association studies via factorial hidden Markov models," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 30(3), pages 737-757, September.
    16. He, Li & Sarkar, Sanat K. & Zhao, Zhigen, 2015. "Capturing the severity of type II errors in high-dimensional multiple testing," Journal of Multivariate Analysis, Elsevier, vol. 142(C), pages 106-116.
    17. Wang, Xia & Shojaie, Ali & Zou, Jian, 2019. "Bayesian hidden Markov models for dependent large-scale multiple testing," Computational Statistics & Data Analysis, Elsevier, vol. 136(C), pages 123-136.
    18. Long Qu & Dan Nettleton & Jack C. M. Dekkers, 2012. "A Hierarchical Semiparametric Model for Incorporating Intergene Information for Analysis of Genomic Data," Biometrics, The International Biometric Society, vol. 68(4), pages 1168-1177, December.
    19. Pounds Stanley B. & Gao Cuilan L. & Zhang Hui, 2012. "Empirical Bayesian Selection of Hypothesis Testing Procedures for Analysis of Sequence Count Expression Data," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 11(5), pages 1-32, October.
    20. Jianqing Fan & Xu Han, 2017. "Estimation of the false discovery proportion with unknown dependence," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 79(4), pages 1143-1164, September.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:compst:v:40:y:2025:i:2:d:10.1007_s00180-024-01510-4. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.