IDEAS home Printed from https://ideas.repec.org/a/spr/stpapr/v65y2024i4d10.1007_s00362-023-01468-3.html
   My bibliography  Save this article

Integrating rather than collecting: statistical matching in the data flood era

Author

Listed:
  • Riccardo D’Alberto

    (Alma Mater Studiorum – University of Bologna)

  • Meri Raggi

    (Alma Mater Studiorum – University of Bologna)

Abstract

Statistical matching is progressively emerging as a straightforward approach to data integration. This method of increasing importance and interest is useful to address the unsolved challenges posed by data shortage as well as the several opportunities occurring in the present data flood era. This paper offers an exhaustive review of the methodology from its early beginnings up to the most recent developments, considering also the most relevant applications. The links that statistical matching has with other integration methods are discussed, analysing how a 50-year-old method has been only recently proposed under a consistent but (yet) incomplete framework. Strengths and weaknesses of statistical matching are compared, considering different data features and sample representativeness frameworks, also, given future research ideas, always keeping an eye on uncertainty, the key problem to which statistical matching tries to answer.

Suggested Citation

  • Riccardo D’Alberto & Meri Raggi, 2024. "Integrating rather than collecting: statistical matching in the data flood era," Statistical Papers, Springer, vol. 65(4), pages 2135-2163, June.
  • Handle: RePEc:spr:stpapr:v:65:y:2024:i:4:d:10.1007_s00362-023-01468-3
    DOI: 10.1007/s00362-023-01468-3
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s00362-023-01468-3
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s00362-023-01468-3?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Rodgers, Willard L, 1984. "An Evaluation of Statistical Matching," Journal of Business & Economic Statistics, American Statistical Association, vol. 2(1), pages 91-102, January.
    2. Holly Sutherland & Rebecca Taylor & Joanna Gomulka, 2002. "Combining Household Income and Expenditure Data in Policy Simulations," Review of Income and Wealth, International Association for Research in Income and Wealth, vol. 48(4), pages 517-536, December.
    3. Riccardo D’Alberto & Matteo Zavalloni & Meri Raggi & Davide Viaggi, 2018. "AES Impact Evaluation With Integrated Farm Data: Combining Statistical Matching and Propensity Score Matching," Sustainability, MDPI, vol. 10(11), pages 1-24, November.
    4. Pier Luigi Conti & Daniela Marella & Mauro Scanu, 2016. "Statistical Matching Analysis for Complex Survey Data With Applications," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 111(516), pages 1715-1725, October.
    5. Klevmarken, Anders, 1982. "Missing Variables and Two-Stage Least-Squares Estimation from More than One Data Set," Working Paper Series 62, Research Institute of Industrial Economics.
    6. Ahfock, Daniel & Pyne, Saumyadipta & Lee, Sharon X. & McLachlan, Geoffrey J., 2016. "Partial identification in the statistical matching problem," Computational Statistics & Data Analysis, Elsevier, vol. 104(C), pages 79-90.
    7. Roee Gutman & Christopher C. Afendulis & Alan M. Zaslavsky, 2013. "A Bayesian Procedure for File Linking to Analyze End-of-Life Medical Costs," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 108(501), pages 34-47, March.
    8. Rubin, Donald B, 1986. "Statistical Matching Using File Concatenation with Adjusted Weights and Multiple Imputations," Journal of Business & Economic Statistics, American Statistical Association, vol. 4(1), pages 87-94, January.
    9. Conti, Pier Luigi & Marella, Daniela & Scanu, Mauro, 2008. "Evaluation of matching noise for imputation techniques based on nonparametric local linear regression estimators," Computational Statistics & Data Analysis, Elsevier, vol. 53(2), pages 354-365, December.
    10. Eric Schulte Nordholt, 1998. "Imputation: Methods, Simulation Experiments and Practical Examples," International Statistical Review, International Statistical Institute, vol. 66(2), pages 157-180, August.
    11. Jae‐Kwang Kim & Siu‐Ming Tam, 2021. "Data Integration by Combining Big Data and Survey Sample Data for Finite Population Inference," International Statistical Review, International Statistical Institute, vol. 89(2), pages 382-401, August.
    12. Edward C. Budd, 1971. "The Creation Of A Microdata File For Estimating The Size Distribution Of Income," Review of Income and Wealth, International Association for Research in Income and Wealth, vol. 17(4), pages 317-333, December.
    13. repec:bla:revinw:v:17:y:1971:i:4:p:317-33 is not listed on IDEAS
    14. repec:bla:revinw:v:48:y:2002:i:4:p:517-36 is not listed on IDEAS
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Andrea Cutillo & Mauro Scanu, 2020. "A Mixed Approach for Data Fusion of HBS and SILC," Social Indicators Research: An International and Interdisciplinary Journal for Quality-of-Life Measurement, Springer, vol. 150(2), pages 411-437, July.
    2. Peter ven de Ven & Anne Harrison & Barbara Fraumeni & Dennis Fixler & David Johnson & Andrew Craig & Kevin Furlong, 2017. "A Consistent Data Series to Evaluate Growth and Inequality in the National Accounts Note: The views expressed in this research, including those related to statistical, methodological, technical, or op," Review of Income and Wealth, International Association for Research in Income and Wealth, vol. 63, pages 437-459, December.
    3. Zahra Rezaei Ghahroodi, 2023. "Statistical matching of sample survey data: application to integrate Iranian time use and labour force surveys," Statistical Methods & Applications, Springer;Società Italiana di Statistica, vol. 32(3), pages 1023-1051, September.
    4. Claramunt González, Juan & van Delden, Arnout & de Waal, Ton, 2023. "Assessment of the effect of constraints in a new multivariate mixed method for statistical matching," Computational Statistics & Data Analysis, Elsevier, vol. 177(C).
    5. Michael S. Rendall & Bonnie Ghosh-Dastidar & Margaret M. Weden & Zafar Nazarov, 2011. "Multiple Imputation for Combined-Survey Estimation With Incomplete Regressors In One But Not Both Surveys," Working Papers WR-887-1, RAND Corporation.
    6. Lamarche, Pierre, 2017. "Estimating consumption in the HFCS: Experimental results on the first wave of the HFCS," Statistics Paper Series 22, European Central Bank.
    7. Chiara Elena Dalla & Menon Martina & Perali Federico, 2019. "An Integrated Database to Measure Living Standards," Journal of Official Statistics, Sciendo, vol. 35(3), pages 531-576, September.
    8. Saeideh Kamgar & Florian Meinfelder & Ralf Münnich & Hamidreza Navvabpour, 2020. "Estimation within the new integrated system of household surveys in Germany," Statistical Papers, Springer, vol. 61(5), pages 2091-2117, October.
    9. Francesco D. d’Ovidio & Paola Perchinunno & Laura Antonucci, 2021. "Data Integration Techniques for the Identification of Poverty Profiles," Social Indicators Research: An International and Interdisciplinary Journal for Quality-of-Life Measurement, Springer, vol. 156(2), pages 515-531, August.
    10. repec:ehu:biltok:5728 is not listed on IDEAS
    11. Ahfock, Daniel & Pyne, Saumyadipta & McLachlan, Geoffrey J., 2022. "Statistical file-matching of non-Gaussian data: A game theoretic approach," Computational Statistics & Data Analysis, Elsevier, vol. 168(C).
    12. Clinton P. McCully, 2013. "Integration of Micro and Macro Data on Consumer Income and Expenditures," BEA Working Papers 0101, Bureau of Economic Analysis.
    13. Fumagalli, Laura & Sala, Emanuela, 2011. "The total survey error paradigm and pre-election polls: the case of the 2006 Italian general elections," ISER Working Paper Series 2011-29, Institute for Social and Economic Research.
    14. Jana Emmenegger & Ralf Münnich & Jannik Schaller, 2022. "Evaluating Data Fusion Methods to Improve Income Modelling," Research Papers in Economics 2022-03, University of Trier, Department of Economics.
    15. Cristina Borra & Almudena Sevilla & Jonathan Gershuny, 2013. "Calibrating Time-Use Estimates for the British Household Panel Survey," Social Indicators Research: An International and Interdisciplinary Journal for Quality-of-Life Measurement, Springer, vol. 114(3), pages 1211-1224, December.
    16. Rasner, Anika & Frick, Joachim R. & Grabka, Markus M., 2013. "Statistical Matching of Administrative and Survey Data: An Application to Wealth Inequality Analysis," EconStor Open Access Articles and Book Chapters, ZBW - Leibniz Information Centre for Economics, vol. 42(2), pages 192-224.
    17. Gessendorfer Jonathan & Beste Jonas & Drechsler Jörg & Sakshaug Joseph W., 2018. "Statistical Matching as a Supplement to Record Linkage: A Valuable Method to Tackle Nonconsent Bias?," Journal of Official Statistics, Sciendo, vol. 34(4), pages 909-933, December.
    18. Pier Luigi Conti & Daniela Marella & Andrea Neri, 2017. "Statistical matching and uncertainty analysis in combining household income and expenditure data," Statistical Methods & Applications, Springer;Società Italiana di Statistica, vol. 26(3), pages 485-505, August.
    19. Daniela Marella & Danny Pfeffermann, 2023. "Accounting for Non‐ignorable Sampling and Non‐response in Statistical Matching," International Statistical Review, International Statistical Institute, vol. 91(2), pages 269-293, August.
    20. Ahfock, Daniel & Pyne, Saumyadipta & Lee, Sharon X. & McLachlan, Geoffrey J., 2016. "Partial identification in the statistical matching problem," Computational Statistics & Data Analysis, Elsevier, vol. 104(C), pages 79-90.
    21. Annie Abello & Sharyn Lymer & Laurie Brown & Ann Harding & Ben Phillips, 2008. "Enhancing the Australian National Health Survey Data for Use in a Microsimulation Model of Pharmaceutical Drug Usage and Cost," Journal of Artificial Societies and Social Simulation, Journal of Artificial Societies and Social Simulation, vol. 11(3), pages 1-2.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:stpapr:v:65:y:2024:i:4:d:10.1007_s00362-023-01468-3. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.