IDEAS home Printed from https://ideas.repec.org/a/jss/jstsof/v074i07.html
   My bibliography  Save this article

Imputation with the R Package VIM

Author

Listed:
  • Kowarik, Alexander
  • Templ, Matthias

Abstract

The package VIM (Templ, Alfons, Kowarik, and Prantner 2016) is developed to explore and analyze the structure of missing values in data using visualization methods, to impute these missing values with the built-in imputation methods and to verify the imputation process using visualization tools, as well as to produce high-quality graphics for publications. This article focuses on the different imputation techniques available in the package. Four different imputation methods are currently implemented in VIM, namely hot-deck imputation, k-nearest neighbor imputation, regression imputation and iterative robust model-based imputation (Templ, Kowarik, and Filzmoser 2011). All of these methods are implemented in a flexible manner with many options for customization. Furthermore in this article practical examples are provided to highlight the use of the implemented methods on real-world applications. In addition, the graphical user interface of VIM has been re-implemented from scratch resulting in the package VIMGUI (Schopfhauser, Templ, Alfons, Kowarik, and Prantner 2016) to enable users without extensive R skills to access these imputation and visualization methods.

Suggested Citation

  • Kowarik, Alexander & Templ, Matthias, 2016. "Imputation with the R Package VIM," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 74(i07).
  • Handle: RePEc:jss:jstsof:v:074:i07
    DOI: http://hdl.handle.net/10.18637/jss.v074.i07
    as

    Download full text from publisher

    File URL: https://www.jstatsoft.org/index.php/jss/article/view/v074i07/v74i07.pdf
    Download Restriction: no

    File URL: https://www.jstatsoft.org/index.php/jss/article/downloadSuppFile/v074i07/VIM_4.6.0.tar.gz
    Download Restriction: no

    File URL: https://www.jstatsoft.org/index.php/jss/article/downloadSuppFile/v074i07/VIMGUI_0.10.0.tar.gz
    Download Restriction: no

    File URL: https://www.jstatsoft.org/index.php/jss/article/downloadSuppFile/v074i07/v74i07.R
    Download Restriction: no

    File URL: https://libkey.io/http://hdl.handle.net/10.18637/jss.v074.i07?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. van Buuren, Stef & Groothuis-Oudshoorn, Karin, 2011. "mice: Multivariate Imputation by Chained Equations in R," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 45(i03).
    2. Alfons, Andreas & Templ, Matthias & Filzmoser, Peter, 2010. "An Object-Oriented Framework for Statistical Simulation: The R Package simFrame," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 37(i03).
    3. Daniel Schunk, 2008. "A Markov chain Monte Carlo algorithm for multiple imputation in large surveys," AStA Advances in Statistical Analysis, Springer;German Statistical Society, vol. 92(1), pages 101-114, February.
    4. Iacus, Stefano M. & Porro, Giuseppe, 2007. "Missing data imputation, matching and other applications of random recursive partitioning," Computational Statistics & Data Analysis, Elsevier, vol. 52(2), pages 773-789, October.
    5. Matthias Templ & Andreas Alfons & Peter Filzmoser, 2012. "Exploring incomplete data using visualization techniques," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 6(1), pages 29-47, April.
    6. Honaker, James & King, Gary & Blackwell, Matthew, 2011. "Amelia II: A Program for Missing Data," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 45(i07).
    7. Alfons, Andreas & Templ, Matthias, 2013. "Estimation of Social Exclusion Indicators from Complex Surveys: The R Package laeken," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 54(i15).
    8. Crookston, Nicholas L. & Finley, Andrew O., 2008. "yaImpute: An R Package for kNN Imputation," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 23(i10).
    9. Josse, Julie & Husson, François, 2016. "missMDA: A Package for Handling Missing Values in Multivariate Data Analysis," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 70(i01).
    10. Templ, Matthias & Kowarik, Alexander & Filzmoser, Peter, 2011. "Iterative stepwise regression imputation using standard and robust methods," Computational Statistics & Data Analysis, Elsevier, vol. 55(10), pages 2793-2806, October.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Nicholas Tierney & Dianne Cook, 2018. "Expanding tidy data principles to facilitate missing data exploration, visualization and assessment of imputations," Monash Econometrics and Business Statistics Working Papers 14/18, Monash University, Department of Econometrics and Business Statistics.
    2. Alfons, Andreas & Templ, Matthias, 2013. "Estimation of Social Exclusion Indicators from Complex Surveys: The R Package laeken," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 54(i15).
    3. Maria Lucia Parrella & Giuseppina Albano & Michele La Rocca & Cira Perna, 2019. "Reconstructing missing data sequences in multivariate time series: an application to environmental data," Statistical Methods & Applications, Springer;Società Italiana di Statistica, vol. 28(2), pages 359-383, June.
    4. Gerko Vink & Laurence E. Frank & Jeroen Pannekoek & Stef Buuren, 2014. "Predictive mean matching imputation of semicontinuous variables," Statistica Neerlandica, Netherlands Society for Statistics and Operations Research, vol. 68(1), pages 61-90, February.
    5. Lara Lopez & Fernando L. Vázquez & Ángela J. Torres & Patricia Otero & Vanessa Blanco & Olga Díaz & Mario Páramo, 2020. "Long-Term Effects of a Cognitive Behavioral Conference Call Intervention on Depression in Non-Professional Caregivers," IJERPH, MDPI, vol. 17(22), pages 1-24, November.
    6. Seiler, Christian & Heumann, Christian, 2013. "Microdata imputations and macrodata implications: Evidence from the Ifo Business Survey," Economic Modelling, Elsevier, vol. 35(C), pages 722-733.
    7. Schoemaker, Nikita K. & Juffer, Femmie & Rippe, Ralph C.A. & Vermeer, Harriet J. & Stoltenborgh, Marije & Jagersma, Gabrine J. & Maras, Athanasios & Alink, Lenneke R.A., 2020. "Positive parenting in foster care: Testing the effectiveness of a video-feedback intervention program on foster parents’ behavior and attitudes," Children and Youth Services Review, Elsevier, vol. 110(C).
    8. Nicklas Pettersson, 2013. "Bias reduction of finite population imputation by kernel methods," Statistics in Transition new series, Główny Urząd Statystyczny (Polska), vol. 14(1), pages 139-160, March.
    9. Jiang, Wei & Josse, Julie & Lavielle, Marc, 2020. "Logistic regression with missing covariates—Parameter estimation, model selection and prediction within a joint-modeling framework," Computational Statistics & Data Analysis, Elsevier, vol. 145(C).
    10. Thelma Dede Baddoo & Zhijia Li & Samuel Nii Odai & Kenneth Rodolphe Chabi Boni & Isaac Kwesi Nooni & Samuel Ato Andam-Akorful, 2021. "Comparison of Missing Data Infilling Mechanisms for Recovering a Real-World Single Station Streamflow Observation," IJERPH, MDPI, vol. 18(16), pages 1-26, August.
    11. Christian Seiler, 2013. "Nonresponse in Business Tendency Surveys: Theoretical Discourse and Empirical Evidence," ifo Beiträge zur Wirtschaftsforschung, ifo Institute - Leibniz Institute for Economic Research at the University of Munich, number 52.
    12. Cheng, Xiaoyue & Cook, Dianne & Hofmann, Heike, 2015. "Visually Exploring Missing Values in Multivariable Data Using a Graphical User Interface," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 68(i06).
    13. Humera Razzak & Christian Heumann, 2019. "Hybrid Multiple Imputation In A Large Scale Complex Survey," Statistics in Transition New Series, Polish Statistical Association, vol. 20(4), pages 33-58, December.
    14. Razzak Humera & Heumann Christian, 2019. "Hybrid Multiple Imputation In A Large Scale Complex Survey," Statistics in Transition New Series, Polish Statistical Association, vol. 20(4), pages 33-58, December.
    15. Josse, Julie & Husson, François, 2016. "missMDA: A Package for Handling Missing Values in Multivariate Data Analysis," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 70(i01).
    16. World Bank & Organisation for Economic Co-operation and Development, 2017. "A Step Ahead," World Bank Publications - Books, The World Bank Group, number 27527, December.
    17. Matthias Templ & Alexander Kowarik & Bernhard Meindl, 2014. "Development and Current Practice in Using R at Statistics Austria," Romanian Statistical Review, Romanian Statistical Review, vol. 62(2), pages 173-184, June.
    18. Roy, Arkaprava & Ghosal, Subhashis, 2022. "Optimal Bayesian smoothing of functional observations over a large graph," Journal of Multivariate Analysis, Elsevier, vol. 189(C).
    19. Matthias Templ, 2023. "Enhancing Precision in Large-Scale Data Analysis: An Innovative Robust Imputation Algorithm for Managing Outliers and Missing Values," Mathematics, MDPI, vol. 11(12), pages 1-22, June.
    20. Kreutzmann, Ann-Kristin & Pannier, Sören & Rojas-Perilla, Natalia & Schmid, Timo & Templ, Matthias & Tzavidis, Nikos, 2017. "The R package emdi for estimating and mapping regionally disaggregated indicators," Discussion Papers 2017/15, Free University Berlin, School of Business & Economics.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:jss:jstsof:v:074:i07. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Christopher F. Baum (email available below). General contact details of provider: http://www.jstatsoft.org/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.