IDEAS home Printed from https://ideas.repec.org/a/sae/jedbes/v38y2013i5p499-521.html
   My bibliography  Save this article

Nonparametric Bayesian Multiple Imputation for Incomplete Categorical Variables in Large-Scale Assessment Surveys

Author

Listed:
  • Yajuan Si

    (Department of Statistics, Columbia University, New York)

  • Jerome P. Reiter

    (Department of Statistical Science, Duke University, Durham)

Abstract

In many surveys, the data comprise a large number of categorical variables that suffer from item nonresponse. Standard methods for multiple imputation, like log-linear models or sequential regression imputation, can fail to capture complex dependencies and can be difficult to implement effectively in high dimensions. We present a fully Bayesian, joint modeling approach to multiple imputation for categorical data based on Dirichlet process mixtures of multinomial distributions. The approach automatically models complex dependencies while being computationally expedient. The Dirichlet process prior distributions enable analysts to avoid fixing the number of mixture components at an arbitrary number. We illustrate repeated sampling properties of the approach using simulated data. We apply the methodology to impute missing background data in the 2007 Trends in International Mathematics and Science Study.

Suggested Citation

  • Yajuan Si & Jerome P. Reiter, 2013. "Nonparametric Bayesian Multiple Imputation for Incomplete Categorical Variables in Large-Scale Assessment Surveys," Journal of Educational and Behavioral Statistics, , vol. 38(5), pages 499-521, October.
  • Handle: RePEc:sae:jedbes:v:38:y:2013:i:5:p:499-521
    DOI: 10.3102/1076998613480394
    as

    Download full text from publisher

    File URL: https://journals.sagepub.com/doi/10.3102/1076998613480394
    Download Restriction: no

    File URL: https://libkey.io/10.3102/1076998613480394?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Jerome P. Reiter, 2007. "Small-sample degrees of freedom for multi-component significance tests with multiple imputation for missing data," Biometrika, Biometrika Trust, vol. 94(2), pages 502-508.
    2. Keisuke Hirano, 2002. "Semiparametric Bayesian Inference in Autoregressive Panel Data Models," Econometrica, Econometric Society, vol. 70(2), pages 781-799, March.
    3. Dunson, David B. & Xing, Chuanhua, 2009. "Nonparametric Bayes Modeling of Multivariate Categorical Data," Journal of the American Statistical Association, American Statistical Association, vol. 104(487), pages 1042-1051.
    4. Neal Thomas, 2002. "The role of secondary covariates when estimating latent trait population distributions," Psychometrika, Springer;The Psychometric Society, vol. 67(1), pages 33-48, March.
    5. Chib, Siddhartha & Hamilton, Barton H., 2002. "Semiparametric Bayes analysis of longitudinal data treatment models," Journal of Econometrics, Elsevier, vol. 110(1), pages 67-89, September.
    6. Horton N.J. & Lipsitz S.R. & Parzen M., 2003. "A Potential for Bias When Rounding in Multiple Imputation," The American Statistician, American Statistical Association, vol. 57, pages 229-232, November.
    7. Reiter, Jerome P. & Raghunathan, Trivellore E., 2007. "The Multiple Adaptations of Multiple Imputation," Journal of the American Statistical Association, American Statistical Association, vol. 102, pages 1462-1471, December.
    8. Andrew Gelman & Iven Van Mechelen & Geert Verbeke & Daniel F. Heitjan & Michel Meulders, 2005. "Multiple Imputation for Model Checking: Completed-Data Plots with Missing and Latent Data," Biometrics, The International Biometric Society, vol. 61(1), pages 74-85, March.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Chenyang Gu & Roee Gutman, 2017. "Combining item response theory with multiple imputation to equate health assessment questionnaires," Biometrics, The International Biometric Society, vol. 73(3), pages 990-998, September.
    2. Kunihama, T. & Herring, A.H. & Halpern, C.T. & Dunson, D.B., 2016. "Nonparametric Bayes modeling with sample survey weights," Statistics & Probability Letters, Elsevier, vol. 113(C), pages 41-48.
    3. Humera Razzak & Christian Heumann, 2019. "Hybrid Multiple Imputation In A Large Scale Complex Survey," Statistics in Transition New Series, Polish Statistical Association, vol. 20(4), pages 33-58, December.
    4. Jared S. Murray & Jerome P. Reiter, 2016. "Multiple Imputation of Missing Categorical and Continuous Values via Bayesian Mixture Models With Local Dependence," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 111(516), pages 1466-1479, October.
    5. Daniel Manrique‐Vallier, 2016. "Bayesian population size estimation using Dirichlet process mixtures," Biometrics, The International Biometric Society, vol. 72(4), pages 1246-1254, December.
    6. Daniel Manrique‐Vallier & Jingchen Hu, 2018. "Bayesian non‐parametric generation of fully synthetic multivariate categorical data in the presence of structural zeros," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 181(3), pages 635-647, June.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Humera Razzak & Christian Heumann, 2019. "Hybrid Multiple Imputation In A Large Scale Complex Survey," Statistics in Transition New Series, Polish Statistical Association, vol. 20(4), pages 33-58, December.
    2. Razzak Humera & Heumann Christian, 2019. "Hybrid Multiple Imputation In A Large Scale Complex Survey," Statistics in Transition New Series, Polish Statistical Association, vol. 20(4), pages 33-58, December.
    3. Jensen, Mark J. & Maheu, John M., 2010. "Bayesian semiparametric stochastic volatility modeling," Journal of Econometrics, Elsevier, vol. 157(2), pages 306-316, August.
    4. Laura Liu & Hyungsik Roger Moon & Frank Schorfheide, 2023. "Forecasting with a panel Tobit model," Quantitative Economics, Econometric Society, vol. 14(1), pages 117-159, January.
    5. Fisher, Mark & Jensen, Mark J., 2022. "Bayesian nonparametric learning of how skill is distributed across the mutual fund industry," Journal of Econometrics, Elsevier, vol. 230(1), pages 131-153.
    6. Federico Bassetti & Roberto Casarin & Marco Del Negro, 2022. "A Bayesian Approach to Inference on Probabilistic Surveys," Staff Reports 1025, Federal Reserve Bank of New York.
    7. Abel Rodriguez & Enrique ter Horst, 2008. "Measuring expectations in options markets: An application to the SP500 index," Papers 0901.0033, arXiv.org.
    8. Billio, Monica & Casarin, Roberto & Rossini, Luca, 2019. "Bayesian nonparametric sparse VAR models," Journal of Econometrics, Elsevier, vol. 212(1), pages 97-115.
    9. Federico Bassetti & Roberto Casarin & Francesco Ravazzolo, 2018. "Bayesian Nonparametric Calibration and Combination of Predictive Distributions," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 113(522), pages 675-685, April.
    10. Griffin, J. E. & Steel, M. F. J., 2004. "Semiparametric Bayesian inference for stochastic frontier models," Journal of Econometrics, Elsevier, vol. 123(1), pages 121-152, November.
    11. Fisher, Mark & Jensen, Mark J., 2019. "Bayesian inference and prediction of a multiple-change-point panel model with nonparametric priors," Journal of Econometrics, Elsevier, vol. 210(1), pages 187-202.
    12. Jared S. Murray & Jerome P. Reiter, 2016. "Multiple Imputation of Missing Categorical and Continuous Values via Bayesian Mixture Models With Local Dependence," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 111(516), pages 1466-1479, October.
    13. Yulia V. Marchenko & Jerome P. Reiter, 2009. "Improved degrees of freedom for multivariate significance tests obtained from multiply imputed, small-sample data," Stata Journal, StataCorp LP, vol. 9(3), pages 388-397, September.
    14. Griffin, J.E. & Steel, M.F.J., 2011. "Stick-breaking autoregressive processes," Journal of Econometrics, Elsevier, vol. 162(2), pages 383-396, June.
    15. Mark J. Jensen, 2004. "Semiparametric Bayesian Inference of Long‐Memory Stochastic Volatility Models," Journal of Time Series Analysis, Wiley Blackwell, vol. 25(6), pages 895-922, November.
    16. Tong Li & Xiaoyong Zheng, 2008. "Semiparametric Bayesian inference for dynamic Tobit panel data models with unobserved heterogeneity," Journal of Applied Econometrics, John Wiley & Sons, Ltd., vol. 23(6), pages 699-728.
    17. Burda, Martin & Prokhorov, Artem, 2014. "Copula based factorization in Bayesian multivariate infinite mixture models," Journal of Multivariate Analysis, Elsevier, vol. 127(C), pages 200-213.
    18. Tong Li & Xiaoyong Zheng, 2009. "Entry and Competition Effects in First-Price Auctions: Theory and Evidence from Procurement Auctions," Review of Economic Studies, Oxford University Press, vol. 76(4), pages 1397-1429.
    19. Monica Billio & Roberto Casarin & Luca Rossini, 2016. "Bayesian nonparametric sparse seemingly unrelated regression model (SUR)," Working Papers 2016:20, Department of Economics, University of Venice "Ca' Foscari".
    20. Cem Çakmakli, 2012. "Bayesian Semiparametric Dynamic Nelson-Siegel Model," Working Paper series 59_12, Rimini Centre for Economic Analysis, revised Sep 2012.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:sae:jedbes:v:38:y:2013:i:5:p:499-521. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: SAGE Publications (email available below). General contact details of provider: .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.