IDEAS home Printed from https://ideas.repec.org/a/bla/biomet/v79y2023i2p926-939.html
   My bibliography  Save this article

Screening methods for linear errors‐in‐variables models in high dimensions

Author

Listed:
  • Linh H. Nghiem
  • Francis K.C. Hui
  • Samuel Müller
  • A.H. Welsh

Abstract

Microarray studies, in order to identify genes associated with an outcome of interest, usually produce noisy measurements for a large number of gene expression features from a small number of subjects. One common approach to analyzing such high‐dimensional data is to use linear errors‐in‐variables (EIV) models; however, current methods for fitting such models are computationally expensive. In this paper, we present two efficient screening procedures, namely, corrected penalized marginal screening (PMSc) and corrected sure independence screening (SISc), to reduce the number of variables for final model building. Both screening procedures are based on fitting corrected marginal regression models relating the outcome to each contaminated covariate separately, which can be computed efficiently even with a large number of features. Under mild conditions, we show that these procedures achieve screening consistency and reduce the number of features substantially, even when the number of covariates grows exponentially with sample size. In addition, if the true covariates are weakly correlated, we show that PMSc can achieve full variable selection consistency. Through a simulation study and an analysis of gene expression data for bone mineral density of Norwegian women, we demonstrate that the two new screening procedures make estimation of linear EIV models computationally scalable in high‐dimensional settings, and improve finite sample estimation and selection performance compared with estimators that do not employ a screening stage.

Suggested Citation

  • Linh H. Nghiem & Francis K.C. Hui & Samuel Müller & A.H. Welsh, 2023. "Screening methods for linear errors‐in‐variables models in high dimensions," Biometrics, The International Biometric Society, vol. 79(2), pages 926-939, June.
  • Handle: RePEc:bla:biomet:v:79:y:2023:i:2:p:926-939
    DOI: 10.1111/biom.13628
    as

    Download full text from publisher

    File URL: https://doi.org/10.1111/biom.13628
    Download Restriction: no

    File URL: https://libkey.io/10.1111/biom.13628?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Nicholas G. Polson & James G. Scott & Jesse Windle, 2014. "The Bayesian bridge," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 76(4), pages 713-733, September.
    2. Emre Barut & Jianqing Fan & Anneleen Verhasselt, 2016. "Conditional Sure Independence Screening," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 111(515), pages 1266-1277, July.
    3. Alexandre Belloni & Mathieu Rosenbaum & Alexandre B. Tsybakov, 2017. "Linear and conic programming estimators in high dimensional errors-in-variables models," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 79(3), pages 939-956, June.
    4. Chen Xu & Jiahua Chen, 2014. "The Sparse MLE for Ultrahigh-Dimensional Feature Screening," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 109(507), pages 1257-1269, September.
    5. Chuong B Do & Joyce Y Tung & Elizabeth Dorfman & Amy K Kiefer & Emily M Drabant & Uta Francke & Joanna L Mountain & Samuel M Goldman & Caroline M Tanner & J William Langston & Anne Wojcicki & Nicholas, 2011. "Web-Based Genome-Wide Association Study Identifies Two Novel Loci and a Substantial Genetic Component for Parkinson's Disease," PLOS Genetics, Public Library of Science, vol. 7(6), pages 1-14, June.
    6. Jianqing Fan & Jinchi Lv, 2008. "Sure independence screening for ultrahigh dimensional feature space," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 70(5), pages 849-911, November.
    7. Runze Li & Wei Zhong & Liping Zhu, 2012. "Feature Screening via Distance Correlation Learning," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 107(499), pages 1129-1139, September.
    8. Francis K. C. Hui & David I. Warton & Scott D. Foster, 2015. "Tuning Parameter Selection for the Adaptive Lasso Using ERIC," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 110(509), pages 262-269, March.
    9. Li, Xiaoxia & Tang, Niansheng & Xie, Jinhan & Yan, Xiaodong, 2020. "A nonparametric feature screening method for ultrahigh-dimensional missing response," Computational Statistics & Data Analysis, Elsevier, vol. 142(C).
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Yi Chu & Lu Lin, 2020. "Conditional SIRS for nonparametric and semiparametric models by marginal empirical likelihood," Statistical Papers, Springer, vol. 61(4), pages 1589-1606, August.
    2. Jingxuan Luo & Lili Yue & Gaorong Li, 2023. "Overview of High-Dimensional Measurement Error Regression Models," Mathematics, MDPI, vol. 11(14), pages 1-22, July.
    3. Lu, Jun & Lin, Lu, 2018. "Feature screening for multi-response varying coefficient models with ultrahigh dimensional predictors," Computational Statistics & Data Analysis, Elsevier, vol. 128(C), pages 242-254.
    4. Xiaochao Xia & Hao Ming, 2022. "A Flexibly Conditional Screening Approach via a Nonparametric Quantile Partial Correlation," Mathematics, MDPI, vol. 10(24), pages 1-32, December.
    5. Qiu, Debin & Ahn, Jeongyoun, 2020. "Grouped variable screening for ultra-high dimensional data for linear model," Computational Statistics & Data Analysis, Elsevier, vol. 144(C).
    6. Ke, Chenlu & Yang, Wei & Yuan, Qingcong & Li, Lu, 2023. "Partial sufficient variable screening with categorical controls," Computational Statistics & Data Analysis, Elsevier, vol. 187(C).
    7. Liming Wang & Xingxiang Li & Xiaoqing Wang & Peng Lai, 2022. "Unified mean-variance feature screening for ultrahigh-dimensional regression," Computational Statistics, Springer, vol. 37(4), pages 1887-1918, September.
    8. Qinqin Hu & Lu Lin, 2022. "Feature Screening in High Dimensional Regression with Endogenous Covariates," Computational Economics, Springer;Society for Computational Economics, vol. 60(3), pages 949-969, October.
    9. Sheng, Ying & Wang, Qihua, 2020. "Model-free feature screening for ultrahigh dimensional classification," Journal of Multivariate Analysis, Elsevier, vol. 178(C).
    10. Qinqin Hu & Lu Lin, 2018. "Conditional feature screening for mean and variance functions in models with multiple-index structure," Metrika: International Journal for Theoretical and Applied Statistics, Springer, vol. 81(4), pages 357-393, May.
    11. Zhou, Yeqing & Liu, Jingyuan & Zhu, Liping, 2020. "Test for conditional independence with application to conditional screening," Journal of Multivariate Analysis, Elsevier, vol. 175(C).
    12. Xi Wu & Shifeng Xiong & Weiyan Mu, 2023. "An Ensemble Method for Feature Screening," Mathematics, MDPI, vol. 11(2), pages 1-14, January.
    13. Randall Reese & Guifang Fu & Geran Zhao & Xiaotian Dai & Xiaotian Li & Kenneth Chiu, 2022. "Epistasis Detection via the Joint Cumulant," Statistics in Biosciences, Springer;International Chinese Statistical Association, vol. 14(3), pages 514-532, December.
    14. Xiaolin Chen & Yi Liu & Qihua Wang, 2019. "Joint feature screening for ultra-high-dimensional sparse additive hazards model by the sparsity-restricted pseudo-score estimator," Annals of the Institute of Statistical Mathematics, Springer;The Institute of Statistical Mathematics, vol. 71(5), pages 1007-1031, October.
    15. Jun Lu & Lu Lin, 2020. "Model-free conditional screening via conditional distance correlation," Statistical Papers, Springer, vol. 61(1), pages 225-244, February.
    16. Yuan, Qingcong & Chen, Xianyan & Ke, Chenlu & Yin, Xiangrong, 2022. "Independence index sufficient variable screening for categorical responses," Computational Statistics & Data Analysis, Elsevier, vol. 174(C).
    17. Arun Srinivasan & Lingzhou Xue & Xiang Zhan, 2021. "Compositional knockoff filter for high‐dimensional regression analysis of microbiome data," Biometrics, The International Biometric Society, vol. 77(3), pages 984-995, September.
    18. Jing Pan & Yuan Yu & Yong Zhou, 2018. "Nonparametric independence feature screening for ultrahigh-dimensional survival data," Metrika: International Journal for Theoretical and Applied Statistics, Springer, vol. 81(7), pages 821-847, October.
    19. Jing Zhang & Qihua Wang & Xuan Wang, 2022. "Surrogate-variable-based model-free feature screening for survival data under the general censoring mechanism," Annals of the Institute of Statistical Mathematics, Springer;The Institute of Statistical Mathematics, vol. 74(2), pages 379-397, April.
    20. Wang, Christina Dan & Chen, Zhao & Lian, Yimin & Chen, Min, 2022. "Asset selection based on high frequency Sharpe ratio," Journal of Econometrics, Elsevier, vol. 227(1), pages 168-188.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:bla:biomet:v:79:y:2023:i:2:p:926-939. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Wiley Content Delivery (email available below). General contact details of provider: http://www.blackwellpublishing.com/journal.asp?ref=0006-341X .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.