IDEAS home Printed from https://ideas.repec.org/p/iab/iabdpa/200615.html
   My bibliography  Save this paper

How valid can data fusion be?

Author

Listed:
  • Kiesl, Hans
  • Rässler, Susanne

Abstract

"Data fusion techniques typically aim to achieve a complete data file from different sources which do not contain the same units. Traditionally, this is done on the basis of variables common to all files. It is well known that those approaches establish conditional independence of the specific variables given the common variables, although they may be conditionally dependent in reality. We discuss the objectives of data fusion in the light of their feasibility and distinguish four levels of validity that a fusion technique may achieve. For a rather general situation, we derive the feasible set of correlation matrices for the variables not jointly observed and suggest a new quality index for data fusion. Finally, we present a suitable and effcient multiple imputation procedure to make use of auxiliary information and to overcome the conditional independence assumption." (Author's abstract, IAB-Doku) ((en))

Suggested Citation

  • Kiesl, Hans & Rässler, Susanne, 2006. "How valid can data fusion be?," IAB-Discussion Paper 200615, Institut für Arbeitsmarkt- und Berufsforschung (IAB), Nürnberg [Institute for Employment Research, Nuremberg, Germany].
  • Handle: RePEc:iab:iabdpa:200615
    as

    Download full text from publisher

    File URL: https://doku.iab.de/discussionpapers/2006/dp1506.pdf
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Rodgers, Willard L, 1984. "An Evaluation of Statistical Matching," Journal of Business & Economic Statistics, American Statistical Association, vol. 2(1), pages 91-102, January.
    2. Moriarity, Chris & Scheuren, Fritz, 2003. "A Note on Rubin's Statistical Matching Using File Concatenation with Adjusted Weights and Multiple Imputations," Journal of Business & Economic Statistics, American Statistical Association, vol. 21(1), pages 65-73, January.
    3. Donald Rubin & Dorothy Thayer, 1978. "Relating tests given to different samples," Psychometrika, Springer;The Psychometric Society, vol. 43(1), pages 3-10, March.
    4. Ridder, Geert & Moffitt, Robert, 2007. "The Econometrics of Data Combination," Handbook of Econometrics, in: J.J. Heckman & E.E. Leamer (ed.), Handbook of Econometrics, edition 1, volume 6, chapter 75, Elsevier.
    5. Rubin, Donald B, 1986. "Statistical Matching Using File Concatenation with Adjusted Weights and Multiple Imputations," Journal of Business & Economic Statistics, American Statistical Association, vol. 4(1), pages 87-94, January.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Hohendanner, Christian, 2007. "Verdrängen Ein-Euro-Jobs sozialversicherungspflichtige Beschäftigung in den Betrieben?," IAB-Discussion Paper 200708, Institut für Arbeitsmarkt- und Berufsforschung (IAB), Nürnberg [Institute for Employment Research, Nuremberg, Germany].
    2. Kruppe, Thomas, 2006. "Die Förderung beruflicher Weiterbildung : eine mikroökonometrische Evaluation der Ergänzung durch das ESF-BA-Programm," IAB-Discussion Paper 200621, Institut für Arbeitsmarkt- und Berufsforschung (IAB), Nürnberg [Institute for Employment Research, Nuremberg, Germany].
    3. Blien, Uwe & Kirchhof, Kai & Ludewig, Oliver, 2006. "Agglomeration effects on labour demand," IAB-Discussion Paper 200628, Institut für Arbeitsmarkt- und Berufsforschung (IAB), Nürnberg [Institute for Employment Research, Nuremberg, Germany].
    4. Andrea Cutillo & Mauro Scanu, 2020. "A Mixed Approach for Data Fusion of HBS and SILC," Social Indicators Research: An International and Interdisciplinary Journal for Quality-of-Life Measurement, Springer, vol. 150(2), pages 411-437, July.
    5. Zvi Gilula & Robert McCulloch, 2013. "Multi level categorical data fusion using partially fused data," Quantitative Marketing and Economics (QME), Springer, vol. 11(3), pages 353-377, September.
    6. Frethey-Bentham, Catherine, 2011. "Pseudo panels as an alternative study design," Australasian marketing journal, Elsevier, vol. 19(4), pages 281-292.
    7. Eckey, Hans-Friedrich & Schwengler, Barbara & Türck, Matthias, 2007. "Vergleich von deutschen Arbeitsmarktregionen," IAB-Discussion Paper 200703, Institut für Arbeitsmarkt- und Berufsforschung (IAB), Nürnberg [Institute for Employment Research, Nuremberg, Germany].

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Michael S. Rendall & Bonnie Ghosh-Dastidar & Margaret M. Weden & Elizabeth H. Baker & Zafar Nazarov, 2013. "Multiple Imputation for Combined-survey Estimation With Incomplete Regressors in One but Not Both Surveys," Sociological Methods & Research, , vol. 42(4), pages 483-530, November.
    2. Michael S. Rendall & Bonnie Ghosh-Dastidar & Margaret M. Weden & Zafar Nazarov, 2011. "Multiple Imputation for Combined-Survey Estimation With Incomplete Regressors In One But Not Both Surveys," Working Papers WR-887-1, RAND Corporation.
    3. Ahfock, Daniel & Pyne, Saumyadipta & Lee, Sharon X. & McLachlan, Geoffrey J., 2016. "Partial identification in the statistical matching problem," Computational Statistics & Data Analysis, Elsevier, vol. 104(C), pages 79-90.
    4. Peter ven de Ven & Anne Harrison & Barbara Fraumeni & Dennis Fixler & David Johnson & Andrew Craig & Kevin Furlong, 2017. "A Consistent Data Series to Evaluate Growth and Inequality in the National Accounts Note: The views expressed in this research, including those related to statistical, methodological, technical, or op," Review of Income and Wealth, International Association for Research in Income and Wealth, vol. 63, pages 437-459, December.
    5. Clinton P. McCully, 2013. "Integration of Micro and Macro Data on Consumer Income and Expenditures," BEA Working Papers 0101, Bureau of Economic Analysis.
    6. Fumagalli, Laura & Sala, Emanuela, 2011. "The total survey error paradigm and pre-election polls: the case of the 2006 Italian general elections," ISER Working Paper Series 2011-29, Institute for Social and Economic Research.
    7. Angelo Moretti & Natalie Shlomo, 2023. "Improving Probabilistic Record Linkage Using Statistical Prediction Models," International Statistical Review, International Statistical Institute, vol. 91(3), pages 368-394, December.
    8. Azizur Rahman & Ann Harding & Robert Tanton & Shuangzhe Liu, 2010. "Methodological Issues in Spatial Microsimulation Modelling for Small Area Estimation," International Journal of Microsimulation, International Microsimulation Association, vol. 3(2), pages 3-22.
    9. Saeideh Kamgar & Florian Meinfelder & Ralf Münnich & Hamidreza Navvabpour, 2020. "Estimation within the new integrated system of household surveys in Germany," Statistical Papers, Springer, vol. 61(5), pages 2091-2117, October.
    10. Jana Emmenegger & Ralf Münnich & Jannik Schaller, 2022. "Evaluating Data Fusion Methods to Improve Income Modelling," Research Papers in Economics 2022-03, University of Trier, Department of Economics.
    11. Okay Gunes, 2017. "Analysis of Households' Decision Using Full Demand Elasticity Estimates: an Estimation on Turkish Data," Documents de travail du Centre d'Economie de la Sorbonne 17017, Université Panthéon-Sorbonne (Paris 1), Centre d'Economie de la Sorbonne.
    12. Cristina Borra & Almudena Sevilla & Jonathan Gershuny, 2013. "Calibrating Time-Use Estimates for the British Household Panel Survey," Social Indicators Research: An International and Interdisciplinary Journal for Quality-of-Life Measurement, Springer, vol. 114(3), pages 1211-1224, December.
    13. Susan Athey & Raj Chetty & Guido Imbens & Hyunseung Kang, 2016. "Estimating Treatment Effects using Multiple Surrogates: The Role of the Surrogate Score and the Surrogate Index," Papers 1603.09326, arXiv.org, revised Apr 2024.
    14. Rasner, Anika & Frick, Joachim R. & Grabka, Markus M., 2013. "Statistical Matching of Administrative and Survey Data: An Application to Wealth Inequality Analysis," EconStor Open Access Articles and Book Chapters, ZBW - Leibniz Information Centre for Economics, vol. 42(2), pages 192-224.
    15. Okay Gunes, 2017. "Analysis of Households' Decision Using Full Demand Elasticity Estimates: an Estimation on Turkish Data," Université Paris1 Panthéon-Sorbonne (Post-Print and Working Papers) halshs-01491970, HAL.
    16. Pier Luigi Conti & Daniela Marella & Andrea Neri, 2017. "Statistical matching and uncertainty analysis in combining household income and expenditure data," Statistical Methods & Applications, Springer;Società Italiana di Statistica, vol. 26(3), pages 485-505, August.
    17. Andrea Cutillo & Mauro Scanu, 2020. "A Mixed Approach for Data Fusion of HBS and SILC," Social Indicators Research: An International and Interdisciplinary Journal for Quality-of-Life Measurement, Springer, vol. 150(2), pages 411-437, July.
    18. Okay Gunes, 2017. "Analysis of Households' Decision Using Full Demand Elasticity Estimates: an Estimation on Turkish Data," Post-Print halshs-01491970, HAL.
    19. Claramunt González, Juan & van Delden, Arnout & de Waal, Ton, 2023. "Assessment of the effect of constraints in a new multivariate mixed method for statistical matching," Computational Statistics & Data Analysis, Elsevier, vol. 177(C).
    20. Anika Rasner & Joachim R. Frick & Markus M. Grabka, 2013. "Statistical Matching of Administrative and Survey Data," Sociological Methods & Research, , vol. 42(2), pages 192-224, May.

    More about this item

    Keywords

    Datenaufbereitung ; Datenqualität ; Imputationsverfahren ; Datenfusion ; Korrelation ; mathematische Statistik ; angewandte Statistik ; Validität;
    All these keywords.

    JEL classification:

    • C11 - Mathematical and Quantitative Methods - - Econometric and Statistical Methods and Methodology: General - - - Bayesian Analysis: General
    • C15 - Mathematical and Quantitative Methods - - Econometric and Statistical Methods and Methodology: General - - - Statistical Simulation Methods: General
    • C81 - Mathematical and Quantitative Methods - - Data Collection and Data Estimation Methodology; Computer Programs - - - Methodology for Collecting, Estimating, and Organizing Microeconomic Data; Data Access

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:iab:iabdpa:200615. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: IAB, Geschäftsbereich Wissenschaftliche Fachinformation und Bibliothek (email available below). General contact details of provider: https://edirc.repec.org/data/iabbbde.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.