IDEAS home Printed from https://ideas.repec.org/a/spr/stpapr/v63y2022i2d10.1007_s00362-021-01246-z.html
   My bibliography  Save this article

Regularization and variable selection in Heckman selection model

Author

Listed:
  • Emmanuel O. Ogundimu

    (Durham University)

Abstract

Sample selection arises when the outcome of interest is partially observed in a study. A common challenge is the requirement for exclusion restrictions. That is, some of the covariates affecting missingness mechanism do not affect the outcome. The drive to establish this requirement often leads to the inclusion of irrelevant variables in the model. A suboptimal solution is the use of classical variable selection criteria such as AIC and BIC, and traditional variable selection procedures such as stepwise selection. These methods are unstable when there is limited expert knowledge about the variables to include in the model. To address this, we propose the use of adaptive Lasso for variable selection and parameter estimation in both the selection and outcome submodels simultaneously in the absence of exclusion restrictions. By using the maximum likelihood estimator of the sample selection model, we constructed a loss function similar to the least squares regression problem up to a constant, and minimized its penalized version using an efficient algorithm. We show that the estimator, with proper choice of regularization parameter, is consistent and possesses the oracle properties. The method is compared to Lasso and adaptively weighted $$L_{1}$$ L 1 penalized Two-step method. We applied the methods to the well-known Ambulatory Expenditure Data.

Suggested Citation

  • Emmanuel O. Ogundimu, 2022. "Regularization and variable selection in Heckman selection model," Statistical Papers, Springer, vol. 63(2), pages 421-439, April.
  • Handle: RePEc:spr:stpapr:v:63:y:2022:i:2:d:10.1007_s00362-021-01246-z
    DOI: 10.1007/s00362-021-01246-z
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s00362-021-01246-z
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s00362-021-01246-z?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. James J. Heckman, 1976. "The Common Structure of Statistical Models of Truncation, Sample Selection and Limited Dependent Variables and a Simple Estimator for Such Models," NBER Chapters, in: Annals of Economic and Social Measurement, Volume 5, number 4, pages 475-492, National Bureau of Economic Research, Inc.
    2. Stapleton, David C & Young, Douglas J, 1984. "Censored Normal Regression with Measurement Error on the Dependent Variable," Econometrica, Econometric Society, vol. 52(3), pages 737-760, May.
    3. John Antonakis & Samuel Bendahan & Philippe Jacquart & Rafael Lalive, 2010. "On making causal claims : A review and recommendations," Post-Print hal-02313119, HAL.
    4. J. B. Copas & H. G. Li, 1997. "Inference for Non‐random Samples," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 59(1), pages 55-95.
    5. Sartori, Anne E., 2003. "An Estimator for Some Binary-Outcome Selection Models Without Exclusion Restrictions," Political Analysis, Cambridge University Press, vol. 11(2), pages 111-138, April.
    6. Fan J. & Li R., 2001. "Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties," Journal of the American Statistical Association, American Statistical Association, vol. 96, pages 1348-1360, December.
    7. Lee, Lung-Fei, 1983. "Generalized Econometric Models with Selectivity," Econometrica, Econometric Society, vol. 51(2), pages 507-512, March.
    8. Siu Fai Leung & Shihti Yu, 2000. "Collinearity and Two-Step Estimation of Sample Selection Models: Problems, Origins, and Remedies," Computational Economics, Springer;Society for Computational Economics, vol. 15(3), pages 173-199, June.
    9. Zhao, Jun & Kim, Hea-Jung & Kim, Hyoung-Moon, 2020. "New EM-type algorithms for the Heckman selection model," Computational Statistics & Data Analysis, Elsevier, vol. 146(C).
    10. Heckman, James, 2013. "Sample selection bias as a specification error," Applied Econometrics, Russian Presidential Academy of National Economy and Public Administration (RANEPA), vol. 31(3), pages 129-137.
    11. Yulia V. Marchenko & Marc G. Genton, 2012. "A Heckman Selection- t Model," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 107(497), pages 304-317, March.
    12. Zou, Hui, 2006. "The Adaptive Lasso and Its Oracle Properties," Journal of the American Statistical Association, American Statistical Association, vol. 101, pages 1418-1429, December.
    13. Mikhail Zhelonkin & Marc G. Genton & Elvezio Ronchetti, 2016. "Robust inference in sample selection models," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 78(4), pages 805-827, September.
    14. Friedman, Jerome H. & Hastie, Trevor & Tibshirani, Rob, 2010. "Regularization Paths for Generalized Linear Models via Coordinate Descent," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 33(i01).
    15. Leeb, Hannes & Pötscher, Benedikt M., 2005. "Model Selection And Inference: Facts And Fiction," Econometric Theory, Cambridge University Press, vol. 21(1), pages 21-59, February.
    16. Emmanuel O. Ogundimu & Jane L. Hutton, 2016. "A Sample Selection Model with Skew-normal Distribution," Scandinavian Journal of Statistics, Danish Society for Theoretical Statistics;Finnish Statistical Society;Norwegian Statistical Association;Swedish Statistical Association, vol. 43(1), pages 172-190, March.
    17. A. Colin Cameron & Pravin K. Trivedi, 2010. "Microeconometrics Using Stata, Revised Edition," Stata Press books, StataCorp LP, number musr, March.
    18. Hui Zou & Trevor Hastie, 2005. "Addendum: Regularization and variable selection via the elastic net," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 67(5), pages 768-768, November.
    19. Minna Genbäck & Elena Stanghellini & Xavier Luna, 2015. "Uncertainty intervals for regression parameters with non-ignorable missingness in the outcome," Statistical Papers, Springer, vol. 56(3), pages 829-847, August.
    20. Hui Zou & Trevor Hastie, 2005. "Regularization and variable selection via the elastic net," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 67(2), pages 301-320, April.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Xianyi Wu & Xian Zhou, 2019. "On Hodges’ superefficiency and merits of oracle property in model selection," Annals of the Institute of Statistical Mathematics, Springer;The Institute of Statistical Mathematics, vol. 71(5), pages 1093-1119, October.
    2. Tutz, Gerhard & Pößnecker, Wolfgang & Uhlmann, Lorenz, 2015. "Variable selection in general multinomial logit models," Computational Statistics & Data Analysis, Elsevier, vol. 82(C), pages 207-222.
    3. Camila Epprecht & Dominique Guegan & Álvaro Veiga & Joel Correa da Rosa, 2017. "Variable selection and forecasting via automated methods for linear models: LASSO/adaLASSO and Autometrics," Post-Print halshs-00917797, HAL.
    4. Peter Bühlmann & Jacopo Mandozzi, 2014. "High-dimensional variable screening and bias in subsequent inference, with an empirical comparison," Computational Statistics, Springer, vol. 29(3), pages 407-430, June.
    5. Peter Martey Addo & Dominique Guegan & Bertrand Hassani, 2018. "Credit Risk Analysis Using Machine and Deep Learning Models," Risks, MDPI, vol. 6(2), pages 1-20, April.
    6. Capanu, Marinela & Giurcanu, Mihai & Begg, Colin B. & Gönen, Mithat, 2023. "Subsampling based variable selection for generalized linear models," Computational Statistics & Data Analysis, Elsevier, vol. 184(C).
    7. Loann David Denis Desboulets, 2018. "A Review on Variable Selection in Regression Analysis," Econometrics, MDPI, vol. 6(4), pages 1-27, November.
    8. Zeyu Bian & Erica E. M. Moodie & Susan M. Shortreed & Sahir Bhatnagar, 2023. "Variable selection in regression‐based estimation of dynamic treatment regimes," Biometrics, The International Biometric Society, vol. 79(2), pages 988-999, June.
    9. Jingxuan Luo & Lili Yue & Gaorong Li, 2023. "Overview of High-Dimensional Measurement Error Regression Models," Mathematics, MDPI, vol. 11(14), pages 1-22, July.
    10. Martinez Josue G. & Carroll Raymond J & Muller Samuel & Sampson Joshua N. & Chatterjee Nilanjan, 2010. "A Note on the Effect on Power of Score Tests via Dimension Reduction by Penalized Regression under the Null," The International Journal of Biostatistics, De Gruyter, vol. 6(1), pages 1-14, March.
    11. Zanhua Yin, 2020. "Variable selection for sparse logistic regression," Metrika: International Journal for Theoretical and Applied Statistics, Springer, vol. 83(7), pages 821-836, October.
    12. Ricardo P. Masini & Marcelo C. Medeiros & Eduardo F. Mendes, 2023. "Machine learning advances for time series forecasting," Journal of Economic Surveys, Wiley Blackwell, vol. 37(1), pages 76-111, February.
    13. Dumitrescu, Elena & Hué, Sullivan & Hurlin, Christophe & Tokpavi, Sessi, 2022. "Machine learning for credit scoring: Improving logistic regression with non-linear decision-tree effects," European Journal of Operational Research, Elsevier, vol. 297(3), pages 1178-1192.
    14. Pei Wang & Shunjie Chen & Sijia Yang, 2022. "Recent Advances on Penalized Regression Models for Biological Data," Mathematics, MDPI, vol. 10(19), pages 1-24, October.
    15. Zakariya Yahya Algamal & Muhammad Hisyam Lee, 2019. "A two-stage sparse logistic regression for optimal gene selection in high-dimensional microarray data classification," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 13(3), pages 753-771, September.
    16. Ertefaie Ashkan & Asgharian Masoud & Stephens David A., 2018. "Variable Selection in Causal Inference using a Simultaneous Penalization Method," Journal of Causal Inference, De Gruyter, vol. 6(1), pages 1-16, March.
    17. Satre-Meloy, Aven, 2019. "Investigating structural and occupant drivers of annual residential electricity consumption using regularization in regression models," Energy, Elsevier, vol. 174(C), pages 148-168.
    18. Dai, Linlin & Chen, Kani & Sun, Zhihua & Liu, Zhenqiu & Li, Gang, 2018. "Broken adaptive ridge regression and its asymptotic properties," Journal of Multivariate Analysis, Elsevier, vol. 168(C), pages 334-351.
    19. Elena Ivona DUMITRESCU & Sullivan HUE & Christophe HURLIN & Sessi TOKPAVI, 2020. "Machine Learning or Econometrics for Credit Scoring: Let’s Get the Best of Both Worlds," LEO Working Papers / DR LEO 2839, Orleans Economics Laboratory / Laboratoire d'Economie d'Orleans (LEO), University of Orleans.
    20. Posch, Konstantin & Arbeiter, Maximilian & Pilz, Juergen, 2020. "A novel Bayesian approach for variable selection in linear regression models," Computational Statistics & Data Analysis, Elsevier, vol. 144(C).

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:stpapr:v:63:y:2022:i:2:d:10.1007_s00362-021-01246-z. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.