IDEAS home Printed from https://ideas.repec.org/a/bpj/sagmbi/v15y2016i4p305-320n3.html
   My bibliography  Save this article

The use of vector bootstrapping to improve variable selection precision in Lasso models

Author

Listed:
  • Laurin Charles

    ()

  • Boomsma Dorret

    (Department of Biological Psychology, VU University Amsterdam, Amsterdam, 1081 HV, Netherlands)

  • Lubke Gitta

Abstract

The Lasso is a shrinkage regression method that is widely used for variable selection in statistical genetics. Commonly, K-fold cross-validation is used to fit a Lasso model. This is sometimes followed by using bootstrap confidence intervals to improve precision in the resulting variable selections. Nesting cross-validation within bootstrapping could provide further improvements in precision, but this has not been investigated systematically. We performed simulation studies of Lasso variable selection precision (VSP) with and without nesting cross-validation within bootstrapping. Data were simulated to represent genomic data under a polygenic model as well as under a model with effect sizes representative of typical GWAS results. We compared these approaches to each other as well as to software defaults for the Lasso. Nested cross-validation had the most precise variable selection at small effect sizes. At larger effect sizes, there was no advantage to nesting. We illustrated the nested approach with empirical data comprising SNPs and SNP-SNP interactions from the most significant SNPs in a GWAS of borderline personality symptoms. In the empirical example, we found that the default Lasso selected low-reliability SNPs and interactions which were excluded by bootstrapping.

Suggested Citation

  • Laurin Charles & Boomsma Dorret & Lubke Gitta, 2016. "The use of vector bootstrapping to improve variable selection precision in Lasso models," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 15(4), pages 305-320, August.
  • Handle: RePEc:bpj:sagmbi:v:15:y:2016:i:4:p:305-320:n:3
    as

    Download full text from publisher

    File URL: https://www.degruyter.com/view/j/sagmb.2016.15.issue-4/sagmb-2015-0043/sagmb-2015-0043.xml?format=INT
    Download Restriction: For access to full text, subscription to the journal or payment for the individual article is required.

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Jianqing Fan & Shaojun Guo & Ning Hao, 2012. "Variance estimation using refitted cross‐validation in ultrahigh dimensional regression," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 74(1), pages 37-65, January.
    2. Friedman, Jerome H. & Hastie, Trevor & Tibshirani, Rob, 2010. "Regularization Paths for Generalized Linear Models via Coordinate Descent," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 33(i01).
    3. Robert Tibshirani, 2011. "Regression shrinkage and selection via the lasso: a retrospective," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 73(3), pages 273-282, June.
    4. Pötscher, Benedikt M. & Leeb, Hannes, 2009. "On the distribution of penalized maximum likelihood estimators: The LASSO, SCAD, and thresholding," Journal of Multivariate Analysis, Elsevier, vol. 100(9), pages 2065-2082, October.
    5. Freedman, David & Lane, David, 1983. "A Nonstochastic Interpretation of Reported Significance Levels," Journal of Business & Economic Statistics, American Statistical Association, vol. 1(4), pages 292-298, October.
    6. Gareth M. James & Peter Radchenko, 2009. "A generalized Dantzig selector with shrinkage tuning," Biometrika, Biometrika Trust, vol. 96(2), pages 323-337.
    7. Chatterjee, A. & Lahiri, S. N., 2011. "Bootstrapping Lasso Estimators," Journal of the American Statistical Association, American Statistical Association, vol. 106(494), pages 608-625.
    Full references (including those not matched with items on IDEAS)

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:bpj:sagmbi:v:15:y:2016:i:4:p:305-320:n:3. See general information about how to correct material in RePEc.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: (Peter Golla). General contact details of provider: https://www.degruyter.com .

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service hosted by the Research Division of the Federal Reserve Bank of St. Louis . RePEc uses bibliographic data supplied by the respective publishers.