IDEAS home Printed from https://ideas.repec.org/p/iza/izadps/dp12081.html
   My bibliography  Save this paper

Iassopack: Model Selection and Prediction with Regularized Regression in Stata

Author

Listed:
  • Ahrens, Achim

    () (Economic and Social Research Institute, Dublin)

  • Hansen, Christian B.

    () (University of Chicago)

  • Schaffer, Mark E

    () (Heriot-Watt University, Edinburgh)

Abstract

This article introduces lassopack, a suite of programs for regularized regression in Stata. lassopack implements lasso, square-root lasso, elastic net, ridge regression, adaptive lasso and post-estimation OLS. The methods are suitable for the high-dimensional setting where the number of predictors p may be large and possibly greater than the number of observations, n. We offer three different approaches for selecting the penalization ('tuning') parameters: information criteria (implemented in lasso2), K-fold cross-validation and h-step ahead rolling cross-validation for cross-section, panel and time-series data (cvlasso), and theory-driven ('rigorous') penalization for the lasso and square-root lasso for cross-section and panel data (rlasso). We discuss the theoretical framework and practical considerations for each approach. We also present Monte Carlo results to compare the performance of the penalization approaches.

Suggested Citation

  • Ahrens, Achim & Hansen, Christian B. & Schaffer, Mark E, 2019. "Iassopack: Model Selection and Prediction with Regularized Regression in Stata," IZA Discussion Papers 12081, Institute of Labor Economics (IZA).
  • Handle: RePEc:iza:izadps:dp12081
    as

    Download full text from publisher

    File URL: http://ftp.iza.org/dp12081.pdf
    Download Restriction: no

    Other versions of this item:

    References listed on IDEAS

    as
    1. Meinshausen, Nicolai & Meier, Lukas & Bühlmann, Peter, 2009. "p-Values for High-Dimensional Regression," Journal of the American Statistical Association, American Statistical Association, vol. 104(488), pages 1671-1681.
    2. repec:eee:csdana:v:120:y:2018:i:c:p:70-83 is not listed on IDEAS
    3. Jiahua Chen & Zehua Chen, 2008. "Extended Bayesian information criteria for model selection with large model spaces," Biometrika, Biometrika Trust, vol. 95(3), pages 759-771.
    4. Alexandre Belloni & Victor Chernozhukov & Christian Hansen & Damian Kozbur, 2016. "Inference in High-Dimensional Panel Models With an Application to Gun Control," Journal of Business & Economic Statistics, Taylor & Francis Journals, vol. 34(4), pages 590-605, October.
    5. Carrasco, Marine, 2012. "A regularization approach to the many instruments problem," Journal of Econometrics, Elsevier, vol. 170(2), pages 383-398.
    6. Achim Ahrens & Christian B. Hansen & Mark E Schaffer, 2018. "PDSLASSO: Stata module for post-selection and post-regularization OLS or IV estimation and inference," Statistical Software Components S458459, Boston College Department of Economics, revised 24 Jan 2019.
    7. Yuhong Yang, 2005. "Can the strengths of AIC and BIC be shared? A conflict between model indentification and regression estimation," Biometrika, Biometrika Trust, vol. 92(4), pages 937-950, December.
    8. Hansen, Christian & Kozbur, Damian, 2014. "Instrumental variables estimation with many weak instruments using regularized JIVE," Journal of Econometrics, Elsevier, vol. 182(2), pages 290-308.
    9. Friedman, Jerome H. & Hastie, Trevor & Tibshirani, Rob, 2010. "Regularization Paths for Generalized Linear Models via Coordinate Descent," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 33(i01).
    10. Zhang, Yiyun & Li, Runze & Tsai, Chih-Ling, 2010. "Regularization Parameter Selections via Generalized Information Criterion," Journal of the American Statistical Association, American Statistical Association, vol. 105(489), pages 312-323.
    11. A. Belloni & D. Chen & V. Chernozhukov & C. Hansen, 2012. "Sparse Models and Methods for Optimal Instruments With an Application to Eminent Domain," Econometrica, Econometric Society, vol. 80(6), pages 2369-2429, November.
    12. Victor Chernozhukov & Christian Hansen & Martin Spindler, 2015. "Post-Selection and Post-Regularization Inference in Linear Models with Many Controls and Instruments," American Economic Review, American Economic Association, vol. 105(5), pages 486-490, May.
    13. Alexandre Belloni & Victor Chernozhukov & Christian Hansen, 2014. "Inference on Treatment Effects after Selection among High-Dimensional Controlsâ€," Review of Economic Studies, Oxford University Press, vol. 81(2), pages 608-650.
    14. repec:oup:qjecon:v:133:y:2018:i:1:p:237-293. is not listed on IDEAS
    15. A. Belloni & V. Chernozhukov & L. Wang, 2011. "Square-root lasso: pivotal recovery of sparse signals via conic programming," Biometrika, Biometrika Trust, vol. 98(4), pages 791-806.
    16. repec:aea:jecper:v:31:y:2017:i:2:p:87-106 is not listed on IDEAS
    17. Victor Chernozhukov & Denis Chetverikov & Kengo Kato, 2012. "Gaussian approximations and multiplier bootstrap for maxima of sums of high-dimensional random vectors," Papers 1212.6906, arXiv.org, revised Jan 2018.
    18. Jon Kleinberg & Himabindu Lakkaraju & Jure Leskovec & Jens Ludwig & Sendhil Mullainathan, 2018. "Human Decisions and Machine Predictions," The Quarterly Journal of Economics, Oxford University Press, vol. 133(1), pages 237-293.
    19. Hal R. Varian, 2014. "Big Data: New Tricks for Econometrics," Journal of Economic Perspectives, American Economic Association, vol. 28(2), pages 3-28, Spring.
    20. Hui Zou & Trevor Hastie, 2005. "Addendum: Regularization and variable selection via the elastic net," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 67(5), pages 768-768, November.
    21. Hui Zou & Trevor Hastie, 2005. "Regularization and variable selection via the elastic net," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 67(2), pages 301-320, April.
    Full references (including those not matched with items on IDEAS)

    More about this item

    Keywords

    lasso2; cvlasso; rlasso; lasso; elastic net; square-root lasso; cross-validation;

    JEL classification:

    • C53 - Mathematical and Quantitative Methods - - Econometric Modeling - - - Forecasting and Prediction Models; Simulation Methods
    • C55 - Mathematical and Quantitative Methods - - Econometric Modeling - - - Large Data Sets: Modeling and Analysis
    • C87 - Mathematical and Quantitative Methods - - Data Collection and Data Estimation Methodology; Computer Programs - - - Econometric Software

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:iza:izadps:dp12081. See general information about how to correct material in RePEc.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: (Holger Hinte). General contact details of provider: http://www.iza.org .

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service hosted by the Research Division of the Federal Reserve Bank of St. Louis . RePEc uses bibliographic data supplied by the respective publishers.