IDEAS home Printed from https://ideas.repec.org/p/trr/wpaper/202203.html
   My bibliography  Save this paper

Evaluating Data Fusion Methods to Improve Income Modelling

Author

Listed:
  • Jana Emmenegger
  • Ralf Münnich
  • Jannik Schaller

Abstract

Income is an important economic indicator to measure living standards and individual well-being. In Germany, there exist different data sources that yield ambiguous evidence when analysing the income distribution. The Tax Statistics (TS) – an income register recording the total population of more than 40 million taxpayers in Germany for the year 2014 − contains the most reliable income information covering the full income distribution. However, it offers only a limited range of socio-demographic variables essential for income analysis. We tackle this challenge by enriching the tax data with information on education and working time from the Microcensus. For that purpose, we ex- amine two types of data fusion methods that seem suited for the specific data fusion scenario of the Tax Statistics and the Microcensus: Missing-data methods on the one hand and performant prediction models on the other hand. We conduct a simulation study and provide an empirical application comparing the proposed data fusion methods, and our results indicate that Multinomial Regression and Random Forest are the most suitable methods for our data fusion scenario.

Suggested Citation

  • Jana Emmenegger & Ralf Münnich & Jannik Schaller, 2022. "Evaluating Data Fusion Methods to Improve Income Modelling," Research Papers in Economics 2022-03, University of Trier, Department of Economics.
  • Handle: RePEc:trr:wpaper:202203
    as

    Download full text from publisher

    File URL: http://www.uni-trier.de/fileadmin/fb4/prof/VWL/EWF/Research_Papers/2022-03.pdf
    File Function: First version, 2022
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Stefan Bach & Giacomo Corneo & Viktor Steiner, 2009. "From Bottom To Top: The Entire Income Distribution In Germany, 1992–2003," Review of Income and Wealth, International Association for Research in Income and Wealth, vol. 55(2), pages 303-330, June.
    2. Cowell, F.A., 2000. "Measurement of inequality," Handbook of Income Distribution, in: A.B. Atkinson & F. Bourguignon (ed.), Handbook of Income Distribution, edition 1, volume 1, chapter 2, pages 87-166, Elsevier.
    3. Little, Roderick J A, 1988. "Missing-Data Adjustments in Large Surveys," Journal of Business & Economic Statistics, American Statistical Association, vol. 6(3), pages 287-296, July.
    4. Ravallion, Martin & Chen, Shaohua, 1997. "What Can New Survey Data Tell Us about Recent Changes in Distribution and Poverty?," The World Bank Economic Review, World Bank, vol. 11(2), pages 357-382, May.
    5. Jacob Mincer, 1958. "Investment in Human Capital and Personal Income Distribution," Journal of Political Economy, University of Chicago Press, vol. 66(4), pages 281-281.
    6. Wright, Marvin N. & Ziegler, Andreas, 2017. "ranger: A Fast Implementation of Random Forests for High Dimensional Data in C++ and R," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 77(i01).
    7. Thomas Blanchet & Juliette Fournier & Thomas Piketty, 2022. "Generalized Pareto Curves: Theory and Applications," Review of Income and Wealth, International Association for Research in Income and Wealth, vol. 68(1), pages 263-288, March.
    8. Stefan Angel & Franziska Disslbacher & Stefan Humer & Matthias Schnetzer, 2019. "What did you really earn last year?: explaining measurement error in survey income data," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 182(4), pages 1411-1437, October.
    9. van Buuren, Stef & Groothuis-Oudshoorn, Karin, 2011. "mice: Multivariate Imputation by Chained Equations in R," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 45(i03).
    10. Brzeziński, Michał & Myck, Michal & Najsztub, Mateusz, 2019. "Reevaluating Distributional Consequences of the Transition to Market Economy in Poland: New Results from Combined Household Survey and Tax Return Data," IZA Discussion Papers 12734, Institute of Labor Economics (IZA).
    11. Thomas Piketty, 2015. "About Capital in the Twenty-First Century," American Economic Review, American Economic Association, vol. 105(5), pages 48-53, May.
    12. Anthony B Atkinson & François Bourguignon, 2014. "Handbook of Income Distribution," Post-Print halshs-02923231, HAL.
    13. Charlotte Bartels & Maria Metzing, 2019. "An integrated approach for a top-corrected income distribution," The Journal of Economic Inequality, Springer;Society for the Study of Economic Inequality, vol. 17(2), pages 125-143, June.
    14. Rubin, Donald B, 1986. "Statistical Matching Using File Concatenation with Adjusted Weights and Multiple Imputations," Journal of Business & Economic Statistics, American Statistical Association, vol. 4(1), pages 87-94, January.
    15. repec:hal:pseose:halshs-01157487 is not listed on IDEAS
    16. Jonathan Haughton & Shahidur R. Khandker, 2009. "Handbook on Poverty and Inequality," World Bank Publications - Books, The World Bank Group, number 11985.
    17. Rebecca R. Andridge & Roderick J. A. Little, 2010. "A Review of Hot Deck Imputation for Survey Non‐response," International Statistical Review, International Statistical Institute, vol. 78(1), pages 40-64, April.
    18. Benjamin Okner, 1972. "Constructing a New Data Base from Existing Microdata Sets: The 1966 Merge File," NBER Chapters, in: Annals of Economic and Social Measurement, Volume 1, number 3, pages 325-362, National Bureau of Economic Research, Inc.
    19. Little, Roderick J A, 1988. "Missing-Data Adjustments in Large Surveys: Reply," Journal of Business & Economic Statistics, American Statistical Association, vol. 6(3), pages 300-301, July.
    20. Rodgers, Willard L, 1984. "An Evaluation of Statistical Matching," Journal of Business & Economic Statistics, American Statistical Association, vol. 2(1), pages 91-102, January.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Rude,Britta Laurin & Robayo,Monica, 2024. "Statistical Matching for Combining the European Survey on Income and Living Conditions and the Household Budget Surveys: An Evaluation of Energy Expenditures in Bulgaria," Policy Research Working Paper Series 10818, The World Bank.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Emmenegger Jana & Münnich Ralf, 2023. "Localising the Upper Tail: How Top Income Corrections Affect Measures of Regional Inequality," Journal of Economics and Statistics (Jahrbuecher fuer Nationaloekonomie und Statistik), De Gruyter, vol. 243(3-4), pages 285-317, June.
    2. Saeideh Kamgar & Florian Meinfelder & Ralf Münnich & Hamidreza Navvabpour, 2020. "Estimation within the new integrated system of household surveys in Germany," Statistical Papers, Springer, vol. 61(5), pages 2091-2117, October.
    3. Anika Rasner & Joachim R. Frick & Markus M. Grabka, 2013. "Statistical Matching of Administrative and Survey Data," Sociological Methods & Research, , vol. 42(2), pages 192-224, May.
    4. Ferreira, Francisco H. G. & Brunori, Paolo, 2024. "Inherited inequality, meritocracy, and the purpose of economic growth," LSE Research Online Documents on Economics 126263, London School of Economics and Political Science, LSE Library.
    5. Ralf Münnich & Siegfried Gabler & Christian Bruch & Jan Pablo Burgard & Tobias Enderle & Jan-Philipp Kolb & Thomas Zimmermann, 2015. "Tabellenauswertungen im Zensus unter Berücksichtigung fehlender Werte," AStA Wirtschafts- und Sozialstatistisches Archiv, Springer;Deutsche Statistische Gesellschaft - German Statistical Society, vol. 9(3), pages 269-304, December.
    6. Chenyang Gu & Roee Gutman, 2017. "Combining item response theory with multiple imputation to equate health assessment questionnaires," Biometrics, The International Biometric Society, vol. 73(3), pages 990-998, September.
    7. Rasner, Anika & Frick, Joachim R. & Grabka, Markus M., 2013. "Statistical Matching of Administrative and Survey Data: An Application to Wealth Inequality Analysis," EconStor Open Access Articles and Book Chapters, ZBW - Leibniz Information Centre for Economics, vol. 42(2), pages 192-224.
    8. Chia-Ning Wang & Roderick Little & Bin Nan & Siobán D. Harlow, 2011. "A Hot-Deck Multiple Imputation Procedure for Gaps in Longitudinal Recurrent Event Histories," Biometrics, The International Biometric Society, vol. 67(4), pages 1573-1582, December.
    9. Adel Bosch & Steven F. Koch, 2021. "Individual and Household Debt: Does Imputation Choice Matter?," Working Papers 202141, University of Pretoria, Department of Economics.
    10. Katy Bergstrom & William Dodds & Nicholas Lacoste & Juan Rios, 2025. "Estimating the Welfare Cost of Labor Supply Frictions," Working Papers 2503, Tulane University, Department of Economics.
    11. Mingyang Cai & Gerko Vink, 2022. "A note on imputing squares via polynomial combination approach," Computational Statistics, Springer, vol. 37(5), pages 2185-2201, November.
    12. Shu Yang & Jae Kwang Kim, 2020. "Asymptotic theory and inference of predictive mean matching imputation using a superpopulation model framework," Scandinavian Journal of Statistics, Danish Society for Theoretical Statistics;Finnish Statistical Society;Norwegian Statistical Association;Swedish Statistical Association, vol. 47(3), pages 839-861, September.
    13. Joost Ginkel & Pieter Kroonenberg, 2014. "Using Generalized Procrustes Analysis for Multiple Imputation in Principal Component Analysis," Journal of Classification, Springer;The Classification Society, vol. 31(2), pages 242-269, July.
    14. Mathias Silva & Michel Lubrano, 2024. "Bayesian inference for income inequality using a Pareto II tail with an uncertain threshold: Combining EU-SILC and WID data," AMSE Working Papers 2429, Aix-Marseille School of Economics, France.
    15. Gerko Vink & Laurence E. Frank & Jeroen Pannekoek & Stef Buuren, 2014. "Predictive mean matching imputation of semicontinuous variables," Statistica Neerlandica, Netherlands Society for Statistics and Operations Research, vol. 68(1), pages 61-90, February.
    16. Brownstone, David, 1997. "Multiple Imputation Methodology for Missing Data, Non-Random Response, and Panel Attrition," University of California Transportation Center, Working Papers qt2zd6w6hh, University of California Transportation Center.
    17. Westermeier, Christian & Grabka, Markus M., 2016. "Longitudinal Wealth Data and Multiple Imputation: An Evaluation Study," EconStor Open Access Articles and Book Chapters, ZBW - Leibniz Information Centre for Economics, vol. 10(3), pages 237-252.
    18. Youngjoo Cho & Debashis Ghosh, 2021. "Quantile-Based Subgroup Identification for Randomized Clinical Trials," Statistics in Biosciences, Springer;International Chinese Statistical Association, vol. 13(1), pages 90-128, April.
    19. Ahfock, Daniel & Pyne, Saumyadipta & McLachlan, Geoffrey J., 2022. "Statistical file-matching of non-Gaussian data: A game theoretic approach," Computational Statistics & Data Analysis, Elsevier, vol. 168(C).
    20. Yanqing Sun & Li Qi & Fei Heng & Peter B. Gilbert, 2020. "A hybrid approach for the stratified mark‐specific proportional hazards model with missing covariates and missing marks, with application to vaccine efficacy trials," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 69(4), pages 791-814, August.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:trr:wpaper:202203. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Matthias Neuenkirch (email available below). General contact details of provider: https://edirc.repec.org/data/petride.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.