IDEAS home Printed from https://ideas.repec.org/p/trr/wpaper/201905.html
   My bibliography  Save this paper

Penalized Small Area Models for the Combination of Unit- and Area-level Data

Author

Listed:
  • Jan Pablo Burgard
  • Joscha Krause
  • Ralf Münnich

Abstract

The joint usage of unit- and area-level data for model-based small area estimation is investigated. The combination of levels within a single model encloses a variety of methodological problems. Firstly, it implies a critical decrease in degrees of freedom due to more model parameters that need to be estimated. This may destabilize model predictions in the presence of small samples. Secondly, unit- and area-level data has different distributional characteristics in terms of dispersion patterns and correlation structure. Thirdly, unit- and area-level data is usually subject to different kinds of measurement errors. We propose a multi-level model with level-specific penalization to overcome these issues and use unit- and area-level data jointly for model-based small area estimation. An application is provided on the example of regional health measurement in Germany. We combine health survey data on the unit-level and aggregated micro census records on the area-level to estimate hypertension prevalence.

Suggested Citation

  • Jan Pablo Burgard & Joscha Krause & Ralf Münnich, 2019. "Penalized Small Area Models for the Combination of Unit- and Area-level Data," Research Papers in Economics 2019-05, University of Trier, Department of Economics.
  • Handle: RePEc:trr:wpaper:201905
    as

    Download full text from publisher

    File URL: http://www.uni-trier.de/fileadmin/fb4/prof/VWL/EWF/Research_Papers/2019-05.pdf
    File Function: First version, 2019
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Bertsimas, Dimitris & Copenhaver, Martin S., 2018. "Characterization of the equivalence of robustification and regularization in linear and matrix regression," European Journal of Operational Research, Elsevier, vol. 270(3), pages 931-942.
    2. Joseph G. Ibrahim & Hongtu Zhu & Ramon I. Garcia & Ruixin Guo, 2011. "Fixed and Random Effects Selection in Mixed Effects Models," Biometrics, The International Biometric Society, vol. 67(2), pages 495-503, June.
    3. Fan J. & Li R., 2001. "Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties," Journal of the American Statistical Association, American Statistical Association, vol. 96, pages 1348-1360, December.
    4. Twigg, Liz & Moon, Graham & Jones, Kelvyn, 2000. "Predicting small-area health-related behaviour: a comparison of smoking and drinking indicators," Social Science & Medicine, Elsevier, vol. 50(7-8), pages 1109-1120, April.
    5. Lynn M. R. Ybarra & Sharon L. Lohr, 2008. "Small area estimation when auxiliary information is measured with error," Biometrika, Biometrika Trust, vol. 95(4), pages 919-931.
    6. Yalian Li & Hu Yang, 2010. "A new stochastic mixed ridge estimator in linear regression model," Statistical Papers, Springer, vol. 51(2), pages 315-323, June.
    7. Eliot Melissa & Ferguson Jane & Reilly Muredach P. & Foulkes Andrea S., 2011. "Ridge Regression for Longitudinal Biomarker Data," The International Journal of Biostatistics, De Gruyter, vol. 7(1), pages 1-11, September.
    8. Zou, Hui, 2006. "The Adaptive Lasso and Its Oracle Properties," Journal of the American Statistical Association, American Statistical Association, vol. 101, pages 1418-1429, December.
    9. Friedman, Jerome H. & Hastie, Trevor & Tibshirani, Rob, 2010. "Regularization Paths for Generalized Linear Models via Coordinate Descent," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 33(i01).
    10. Miguel Boubeta & María José Lombardía & Domingo Morales, 2016. "Empirical best prediction under area-level Poisson mixed models," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 25(3), pages 548-569, September.
    11. Jan Pablo Burgard & Joscha Krause & Dennis Kreber, 2019. "Regularized Area-level Modelling for Robust Small Area Estimation in the Presence of Unknown Covariate Measurement Errors," Research Papers in Economics 2019-04, University of Trier, Department of Economics.
    12. Howard D. Bondell & Arun Krishna & Sujit K. Ghosh, 2010. "Joint Variable Selection for Fixed and Random Effects in Linear Mixed-Effects Models," Biometrics, The International Biometric Society, vol. 66(4), pages 1069-1077, December.
    13. Sonja Greven & Thomas Kneib, 2010. "On the behaviour of marginal and conditional AIC in linear mixed models," Biometrika, Biometrika Trust, vol. 97(4), pages 773-789.
    14. Hui Zou & Trevor Hastie, 2005. "Addendum: Regularization and variable selection via the elastic net," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 67(5), pages 768-768, November.
    15. Malay Ghosh & Rebecca Steorts, 2013. "Two-stage benchmarking as applied to small area estimation," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 22(4), pages 670-687, November.
    16. Hui Zou & Trevor Hastie, 2005. "Regularization and variable selection via the elastic net," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 67(2), pages 301-320, April.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Simona Buscemi & Antonella Plaia, 2020. "Model selection in linear mixed-effect models," AStA Advances in Statistical Analysis, Springer;German Statistical Society, vol. 104(4), pages 529-575, December.
    2. Mojtaba Ganjali & Taban Baghfalaki, 2018. "Application of Penalized Mixed Model in Identification of Genes in Yeast Cell-Cycle Gene Expression Data," Biostatistics and Biometrics Open Access Journal, Juniper Publishers Inc., vol. 6(2), pages 38-41, April.
    3. Jan Pablo Burgard & Joscha Krause & Dennis Kreber & Domingo Morales, 2021. "The generalized equivalence of regularization and min–max robustification in linear mixed models," Statistical Papers, Springer, vol. 62(6), pages 2857-2883, December.
    4. Tutz, Gerhard & Pößnecker, Wolfgang & Uhlmann, Lorenz, 2015. "Variable selection in general multinomial logit models," Computational Statistics & Data Analysis, Elsevier, vol. 82(C), pages 207-222.
    5. Peter Bühlmann & Jacopo Mandozzi, 2014. "High-dimensional variable screening and bias in subsequent inference, with an empirical comparison," Computational Statistics, Springer, vol. 29(3), pages 407-430, June.
    6. Capanu, Marinela & Giurcanu, Mihai & Begg, Colin B. & Gönen, Mithat, 2023. "Subsampling based variable selection for generalized linear models," Computational Statistics & Data Analysis, Elsevier, vol. 184(C).
    7. Loann David Denis Desboulets, 2018. "A Review on Variable Selection in Regression Analysis," Econometrics, MDPI, vol. 6(4), pages 1-27, November.
    8. Zeyu Bian & Erica E. M. Moodie & Susan M. Shortreed & Sahir Bhatnagar, 2023. "Variable selection in regression‐based estimation of dynamic treatment regimes," Biometrics, The International Biometric Society, vol. 79(2), pages 988-999, June.
    9. Zanhua Yin, 2020. "Variable selection for sparse logistic regression," Metrika: International Journal for Theoretical and Applied Statistics, Springer, vol. 83(7), pages 821-836, October.
    10. Dumitrescu, Elena & Hué, Sullivan & Hurlin, Christophe & Tokpavi, Sessi, 2022. "Machine learning for credit scoring: Improving logistic regression with non-linear decision-tree effects," European Journal of Operational Research, Elsevier, vol. 297(3), pages 1178-1192.
    11. Pei Wang & Shunjie Chen & Sijia Yang, 2022. "Recent Advances on Penalized Regression Models for Biological Data," Mathematics, MDPI, vol. 10(19), pages 1-24, October.
    12. Zakariya Yahya Algamal & Muhammad Hisyam Lee, 2019. "A two-stage sparse logistic regression for optimal gene selection in high-dimensional microarray data classification," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 13(3), pages 753-771, September.
    13. Dai, Linlin & Chen, Kani & Sun, Zhihua & Liu, Zhenqiu & Li, Gang, 2018. "Broken adaptive ridge regression and its asymptotic properties," Journal of Multivariate Analysis, Elsevier, vol. 168(C), pages 334-351.
    14. Zhang, Yan-Qing & Tian, Guo-Liang & Tang, Nian-Sheng, 2016. "Latent variable selection in structural equation models," Journal of Multivariate Analysis, Elsevier, vol. 152(C), pages 190-205.
    15. Posch, Konstantin & Arbeiter, Maximilian & Pilz, Juergen, 2020. "A novel Bayesian approach for variable selection in linear regression models," Computational Statistics & Data Analysis, Elsevier, vol. 144(C).
    16. Tanin Sirimongkolkasem & Reza Drikvandi, 2019. "On Regularisation Methods for Analysis of High Dimensional Data," Annals of Data Science, Springer, vol. 6(4), pages 737-763, December.
    17. Fang, Xiaolei & Paynabar, Kamran & Gebraeel, Nagi, 2017. "Multistream sensor fusion-based prognostics model for systems with single failure modes," Reliability Engineering and System Safety, Elsevier, vol. 159(C), pages 322-331.
    18. Hui Xiao & Yiguo Sun, 2019. "On Tuning Parameter Selection in Model Selection and Model Averaging: A Monte Carlo Study," JRFM, MDPI, vol. 12(3), pages 1-16, June.
    19. Daniel, Jeffrey & Horrocks, Julie & Umphrey, Gary J., 2018. "Penalized composite likelihoods for inhomogeneous Gibbs point process models," Computational Statistics & Data Analysis, Elsevier, vol. 124(C), pages 104-116.
    20. van Erp, Sara & Oberski, Daniel L. & Mulder, Joris, 2018. "Shrinkage priors for Bayesian penalized regression," OSF Preprints cg8fq, Center for Open Science.

    More about this item

    Keywords

    disease mapping; multi-level model; multi-source estimation; penalization; stochastic gradient descent;
    All these keywords.

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:trr:wpaper:201905. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Matthias Neuenkirch (email available below). General contact details of provider: https://edirc.repec.org/data/petride.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.