IDEAS home Printed from https://ideas.repec.org/a/rsr/journl/v63y2015i2p57-68.html
   My bibliography  Save this article

Integration and imputation of survey data in R: the StatMatch package

Author

Listed:
  • Marcello D’Orazio

    (Italian National Institute of Statistics)

Abstract

Statistical matching methods permit to integrate two or more data sources with the purpose of investigating the relationship between variables not jointly observed. Recently these methods received much attention as valid alternative to produce new statistical outputs.The paper provides an overview on the statistical matching methods implemented in the package StatMatch for the R environment, focusing on the most widespread methods and how they were improved. Particular attention is devoted to hot deck matching methods, strictly related to the ones developed for the imputation of missing values. The corresponding functions in StatMatch are very powerful and are flexible enough to be applied for imputing missing values in a survey. The paper tackles also the problem of matching data from complex sample surveys, a very important topic in National Statistical Institutes. Finally it is described the concept of uncertainty characterizing the statistical matching framework and how this alternative approach can be exploited for different purposes.

Suggested Citation

  • Marcello D’Orazio, 2015. "Integration and imputation of survey data in R: the StatMatch package," Romanian Statistical Review, Romanian Statistical Review, vol. 63(2), pages 57-68, June.
  • Handle: RePEc:rsr:journl:v:63:y:2015:i:2:p:57-68
    as

    Download full text from publisher

    File URL: http://www.revistadestatistica.ro/wp-content/uploads/2015/04/RRS2_2015_A06.pdf
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Rubin, Donald B, 1986. "Statistical Matching Using File Concatenation with Adjusted Weights and Multiple Imputations," Journal of Business & Economic Statistics, American Statistical Association, vol. 4(1), pages 87-94, January.
    2. Gabriella Donatiello & Marcello D’Orazio & Doriana Frattarola & Antony Rizzi & Mauro Scanu & Mattia Spaziani, 2014. "Statistical Matching of Income and Consumption Expenditures," International Journal of Economic Sciences, Prague University of Economics and Business, vol. 2014(3), pages 50-65.
    3. Marco Ballin & Marco Di Zio & Marcello D'Orazio & Mauro Scanu & Nicola Torelli, 2008. "File concatenation of survey data: a computer intensive approach to sampling weights estimation," Rivista di statistica ufficiale, ISTAT - Italian National Institute of Statistics - (Rome, ITALY), vol. 10(2), pages 5-12, October.
    4. Gabriella Donatiello & Marcello D'Orazio & Doriana Frattarola & Antony Rizzi & Mauro Scanu & Mattia Spaziani, 2014. "Statistical matching of income and consumption expenditures," Proceedings of International Academic Conferences 0100965, International Institute of Social and Economic Sciences.
    5. Rebecca R. Andridge & Roderick J. A. Little, 2010. "A Review of Hot Deck Imputation for Survey Non‐response," International Statistical Review, International Statistical Institute, vol. 78(1), pages 40-64, April.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Andrea Cutillo & Mauro Scanu, 2020. "A Mixed Approach for Data Fusion of HBS and SILC," Social Indicators Research: An International and Interdisciplinary Journal for Quality-of-Life Measurement, Springer, vol. 150(2), pages 411-437, July.
    2. Baris Ucar & Gianni Betti, 2016. "Longitudinal statistical matching: transferring consumption expenditure from HBS to SILC panel survey," Department of Economics University of Siena 739, Department of Economics, University of Siena.
    3. D'Alberto, Riccardo & Zavalloni, Matteo & Raggi, Meri & Viaggi, Davide, 2021. "A Statistical Matching approach to reproduce the heterogeneity of willingness to pay in benefit transfer," Socio-Economic Planning Sciences, Elsevier, vol. 74(C).
    4. Chiara Elena Dalla & Menon Martina & Perali Federico, 2019. "An Integrated Database to Measure Living Standards," Journal of Official Statistics, Sciendo, vol. 35(3), pages 531-576, September.
    5. Cristina Cirillo & Lucia Imperioli & Marco Manzo, 2021. "The Value Added Tax Simulation Model: VATSIM-DF (II)," Working Papers wp2021-12, Ministry of Economy and Finance, Department of Finance.
    6. Jana Emmenegger & Ralf Münnich & Jannik Schaller, 2022. "Evaluating Data Fusion Methods to Improve Income Modelling," Research Papers in Economics 2022-03, University of Trier, Department of Economics.
    7. Chenyang Gu & Roee Gutman, 2017. "Combining item response theory with multiple imputation to equate health assessment questionnaires," Biometrics, The International Biometric Society, vol. 73(3), pages 990-998, September.
    8. Rasner, Anika & Frick, Joachim R. & Grabka, Markus M., 2013. "Statistical Matching of Administrative and Survey Data: An Application to Wealth Inequality Analysis," EconStor Open Access Articles and Book Chapters, ZBW - Leibniz Information Centre for Economics, vol. 42(2), pages 192-224.
    9. Esposito Laura & Fioroni Livia & Guandalini Alessio, 2019. "Gross income projection in Labour Force Survey Data," RIEDS - Rivista Italiana di Economia, Demografia e Statistica - The Italian Journal of Economic, Demographic and Statistical Studies, SIEDS Societa' Italiana di Economia Demografia e Statistica, vol. 73(4), pages 41-52, October-D.
    10. Gessendorfer Jonathan & Beste Jonas & Drechsler Jörg & Sakshaug Joseph W., 2018. "Statistical Matching as a Supplement to Record Linkage: A Valuable Method to Tackle Nonconsent Bias?," Journal of Official Statistics, Sciendo, vol. 34(4), pages 909-933, December.
    11. Chia-Ning Wang & Roderick Little & Bin Nan & Siobán D. Harlow, 2011. "A Hot-Deck Multiple Imputation Procedure for Gaps in Longitudinal Recurrent Event Histories," Biometrics, The International Biometric Society, vol. 67(4), pages 1573-1582, December.
    12. Riccardo D’Alberto & Matteo Zavalloni & Meri Raggi & Davide Viaggi, 2018. "AES Impact Evaluation With Integrated Farm Data: Combining Statistical Matching and Propensity Score Matching," Sustainability, MDPI, vol. 10(11), pages 1-24, November.
    13. Luca Gandullia & Lucia Leporatti, 2019. "Distributional effects of gambling taxes: empirical evidence from Italy," The Journal of Economic Inequality, Springer;Society for the Study of Economic Inequality, vol. 17(4), pages 565-590, December.
    14. Anika Rasner & Joachim R. Frick & Markus M. Grabka, 2013. "Statistical Matching of Administrative and Survey Data," Sociological Methods & Research, , vol. 42(2), pages 192-224, May.
    15. Shu Yang & Jae Kwang Kim, 2020. "Asymptotic theory and inference of predictive mean matching imputation using a superpopulation model framework," Scandinavian Journal of Statistics, Danish Society for Theoretical Statistics;Finnish Statistical Society;Norwegian Statistical Association;Swedish Statistical Association, vol. 47(3), pages 839-861, September.
    16. Raymundo M. Campos-Vázquez, 2013. "Efectos de los ingresos no reportados en el nivel y tendencia de la pobreza laboral en México," Ensayos Revista de Economia, Universidad Autonoma de Nuevo Leon, Facultad de Economia, vol. 0(2), pages 23-54, November.
    17. François Gardes, 2021. "On the value of time and human life," Documents de travail du Centre d'Economie de la Sorbonne 21023, Université Panthéon-Sorbonne (Paris 1), Centre d'Economie de la Sorbonne.
    18. François Gardes, 2021. "A Solution to the Estimation of an Enlarged GDP Including Domestic Production: An Estimation on Micro Data," Post-Print halshs-03325362, HAL.
    19. Paul T. von Hippel, 2013. "Should a Normal Imputation Model be Modified to Impute Skewed Variables?," Sociological Methods & Research, , vol. 42(1), pages 105-138, February.
    20. Joost Ginkel & Pieter Kroonenberg, 2014. "Using Generalized Procrustes Analysis for Multiple Imputation in Principal Component Analysis," Journal of Classification, Springer;The Classification Society, vol. 31(2), pages 242-269, July.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:rsr:journl:v:63:y:2015:i:2:p:57-68. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Adrian Visoiu (email available below). General contact details of provider: https://edirc.repec.org/data/stagvro.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.