IDEAS home Printed from https://ideas.repec.org/a/spr/sankhb/v83y2021i1d10.1007_s13571-019-00219-5.html
   My bibliography  Save this article

Model Selection With Mixed Variables on the Lasso Path

Author

Listed:
  • X. Jessie Jeng

    (North Carolina State University)

  • Huimin Peng

    (North Carolina State University)

  • Wenbin Lu

    (North Carolina State University)

Abstract

Among the most popular model selection procedures in high-dimensional regression, Lasso provides a solution path to rank the variables and determines a cut-off position on the path to select variables and estimate coefficients. In this paper, we consider variable selection from a new perspective motivated by the frequently occurred phenomenon that relevant variables are often mixed with noise variables on the solution path. We propose to characterize the positions of the first noise variable and the last relevant variable on the path. We then develop a new variable selection procedure to control over-selection of the noise variables ranking after the last relevant variable, and, at the same time, retain a high proportion of relevant variables ranking before the first noise variable. Our procedure utilizes the recently developed covariance test statistic and Q statistic in post-selection inference. In numerical examples, our method compares favorably with existing methods in selection accuracy and the ability to interpret its results.

Suggested Citation

  • X. Jessie Jeng & Huimin Peng & Wenbin Lu, 2021. "Model Selection With Mixed Variables on the Lasso Path," Sankhya B: The Indian Journal of Statistics, Springer;Indian Statistical Institute, vol. 83(1), pages 170-184, May.
  • Handle: RePEc:spr:sankhb:v:83:y:2021:i:1:d:10.1007_s13571-019-00219-5
    DOI: 10.1007/s13571-019-00219-5
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s13571-019-00219-5
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s13571-019-00219-5?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Max Grazier G'Sell & Stefan Wager & Alexandra Chouldechova & Robert Tibshirani, 2016. "Sequential selection procedures and false discovery rate control," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 78(2), pages 423-444, March.
    2. J. D. Wilbur & J. K. Ghosh & C. H. Nakatsu & S. M. Brouder & R. W. Doerge, 2002. "Variable Selection in High-Dimensional Multivariate Binary Data with Application to the Analysis of Microbial Community DNA Fingerprints," Biometrics, The International Biometric Society, vol. 58(2), pages 378-386, June.
    3. Nicolai Meinshausen & Peter Buhlmann, 2005. "Lower bounds for the number of false null hypotheses for multiple testing of associations under general dependence structures," Biometrika, Biometrika Trust, vol. 92(4), pages 893-907, December.
    4. Cun-Hui Zhang & Stephanie S. Zhang, 2014. "Confidence intervals for low dimensional parameters in high dimensional linear models," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 76(1), pages 217-242, January.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Damian Kozbur, 2020. "Analysis of Testing‐Based Forward Model Selection," Econometrica, Econometric Society, vol. 88(5), pages 2147-2173, September.
    2. Markus Pelger & Jiacheng Zou, 2022. "Inference for Large Panel Data with Many Covariates," Papers 2301.00292, arXiv.org, revised Mar 2023.
    3. Jeng, X. Jessie & Chen, Xiongzhi, 2019. "Predictor ranking and false discovery proportion control in high-dimensional regression," Journal of Multivariate Analysis, Elsevier, vol. 171(C), pages 163-175.
    4. Alexandre Belloni & Victor Chernozhukov & Denis Chetverikov & Christian Hansen & Kengo Kato, 2018. "High-dimensional econometrics and regularized GMM," CeMMAP working papers CWP35/18, Centre for Microdata Methods and Practice, Institute for Fiscal Studies.
    5. Alexandre Belloni & Victor Chernozhukov & Kengo Kato, 2019. "Valid Post-Selection Inference in High-Dimensional Approximately Sparse Quantile Regression Models," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 114(526), pages 749-758, April.
    6. Susan Athey & Guido W. Imbens & Stefan Wager, 2018. "Approximate residual balancing: debiased inference of average treatment effects in high dimensions," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 80(4), pages 597-623, September.
    7. Jelena Bradic & Weijie Ji & Yuqian Zhang, 2021. "High-dimensional Inference for Dynamic Treatment Effects," Papers 2110.04924, arXiv.org, revised May 2023.
    8. Chenchuan (Mark) Li & Ulrich K. Müller, 2021. "Linear regression with many controls of limited explanatory power," Quantitative Economics, Econometric Society, vol. 12(2), pages 405-442, May.
    9. Alexandre Belloni & Victor Chernozhukov & Christian Hansen & Damian Kozbur, 2016. "Inference in High-Dimensional Panel Models With an Application to Gun Control," Journal of Business & Economic Statistics, Taylor & Francis Journals, vol. 34(4), pages 590-605, October.
    10. Van Hanh Nguyen & Catherine Matias, 2014. "On Efficient Estimators of the Proportion of True Null Hypotheses in a Multiple Testing Setup," Scandinavian Journal of Statistics, Danish Society for Theoretical Statistics;Finnish Statistical Society;Norwegian Statistical Association;Swedish Statistical Association, vol. 41(4), pages 1167-1194, December.
    11. Shengchun Kong & Zhuqing Yu & Xianyang Zhang & Guang Cheng, 2021. "High‐dimensional robust inference for Cox regression models using desparsified Lasso," Scandinavian Journal of Statistics, Danish Society for Theoretical Statistics;Finnish Statistical Society;Norwegian Statistical Association;Swedish Statistical Association, vol. 48(3), pages 1068-1095, September.
    12. Victor Chernozhukov & Whitney K. Newey & Victor Quintas-Martinez & Vasilis Syrgkanis, 2021. "Automatic Debiased Machine Learning via Riesz Regression," Papers 2104.14737, arXiv.org, revised Mar 2024.
    13. Guo, Xu & Li, Runze & Liu, Jingyuan & Zeng, Mudong, 2023. "Statistical inference for linear mediation models with high-dimensional mediators and application to studying stock reaction to COVID-19 pandemic," Journal of Econometrics, Elsevier, vol. 235(1), pages 166-179.
    14. Saulius Jokubaitis & Remigijus Leipus, 2022. "Asymptotic Normality in Linear Regression with Approximately Sparse Structure," Mathematics, MDPI, vol. 10(10), pages 1-28, May.
    15. Stéphane Chrétien & Camille Giampiccolo & Wenjuan Sun & Jessica Talbott, 2021. "Fast Hyperparameter Calibration of Sparsity Enforcing Penalties in Total Generalised Variation Penalised Reconstruction Methods for XCT Using a Planted Virtual Reference Image," Mathematics, MDPI, vol. 9(22), pages 1-12, November.
    16. Toshio Honda, 2021. "The de-biased group Lasso estimation for varying coefficient models," Annals of the Institute of Statistical Mathematics, Springer;The Institute of Statistical Mathematics, vol. 73(1), pages 3-29, February.
    17. Hansen, Christian & Liao, Yuan, 2019. "The Factor-Lasso And K-Step Bootstrap Approach For Inference In High-Dimensional Economic Applications," Econometric Theory, Cambridge University Press, vol. 35(3), pages 465-509, June.
    18. Celso Brunetti & Marc Joëts & Valérie Mignon, 2023. "Reasons Behind Words: OPEC Narratives and the Oil Market," Working Papers 2023-19, CEPII research center.
    19. Alexandre Belloni & Victor Chernozhukov & Kengo Kato, 2013. "Uniform post selection inference for LAD regression and other z-estimation problems," CeMMAP working papers CWP74/13, Centre for Microdata Methods and Practice, Institute for Fiscal Studies.
    20. Victor Chernozhukov & Denis Chetverikov & Mert Demirer & Esther Duflo & Christian Hansen & Whitney K. Newey, 2016. "Double machine learning for treatment and causal parameters," CeMMAP working papers 49/16, Institute for Fiscal Studies.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:sankhb:v:83:y:2021:i:1:d:10.1007_s13571-019-00219-5. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.