IDEAS home Printed from https://ideas.repec.org/a/jss/jstsof/v079i10.html
   My bibliography  Save this article

Simulation of Synthetic Complex Data: The R Package simPop

Author

Listed:
  • Templ, Matthias
  • Meindl, Bernhard
  • Kowarik, Alexander
  • Dupriez, Olivier

Abstract

The production of synthetic datasets has been proposed as a statistical disclosure control solution to generate public use files out of protected data, and as a tool to create "augmented datasets" to serve as input for micro-simulation models. Synthetic data have become an important instrument for ex-ante assessments of policy impact. The performance and acceptability of such a tool relies heavily on the quality of the synthetic populations, i.e., on the statistical similarity between the synthetic and the true population of interest. Multiple approaches and tools have been developed to generate synthetic data. These approaches can be categorized into three main groups: synthetic reconstruction, combinatorial optimization, and model-based generation. We provide in this paper a brief overview of these approaches, and introduce simPop, an open source data synthesizer. simPop is a user-friendly R package based on a modular object-oriented concept. It provides a highly optimized S4 class implementation of various methods, including calibration by iterative proportional fitting and simulated annealing, and modeling or data fusion by logistic regression. We demonstrate the use of simPop by creating a synthetic population of Austria, and report on the utility of the resulting data. We conclude with suggestions for further development of the package.

Suggested Citation

  • Templ, Matthias & Meindl, Bernhard & Kowarik, Alexander & Dupriez, Olivier, 2017. "Simulation of Synthetic Complex Data: The R Package simPop," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 79(i10).
  • Handle: RePEc:jss:jstsof:v:079:i10
    DOI: http://hdl.handle.net/10.18637/jss.v079.i10
    as

    Download full text from publisher

    File URL: https://www.jstatsoft.org/index.php/jss/article/view/v079i10/v79i10.pdf
    Download Restriction: no

    File URL: https://www.jstatsoft.org/index.php/jss/article/downloadSuppFile/v079i10/simPop_1.0.0.tar.gz
    Download Restriction: no

    File URL: https://www.jstatsoft.org/index.php/jss/article/downloadSuppFile/v079i10/v79i10.R
    Download Restriction: no

    File URL: https://libkey.io/http://hdl.handle.net/10.18637/jss.v079.i10?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Atkinson, Tony & Cantillon, Bea & Marlier, Eric & Nolan, Brian, 2002. "Social Indicators: The EU and Social Inclusion," OUP Catalogue, Oxford University Press, number 9780199253494.
    2. Beckman, Richard J. & Baggerly, Keith A. & McKay, Michael D., 1996. "Creating synthetic baseline populations," Transportation Research Part A: Policy and Practice, Elsevier, vol. 30(6), pages 415-429, November.
    3. Tony Atkinson & Bea Cantillon & Eric Marlier & Brian Nolan, 2002. "Indicators for Social Inclusion," Politica economica, Società editrice il Mulino, issue 1, pages 7-28.
    4. Laurie Brown & Ann Harding, 2002. "Social Modelling and Public Policy: Application of Microsimulation Modelling in Australia," Journal of Artificial Societies and Social Simulation, Journal of Artificial Societies and Social Simulation, vol. 5(4), pages 1-6.
    5. Alfons, Andreas & Templ, Matthias, 2013. "Estimation of Social Exclusion Indicators from Complex Surveys: The R Package laeken," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 54(i15).
    6. Andreas Alfons & Stefan Kraft & Matthias Templ & Peter Filzmoser, 2011. "Simulation of close-to-reality population data for household surveys with application to EU-SILC," Statistical Methods & Applications, Springer;Società Italiana di Statistica, vol. 20(3), pages 383-407, August.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Trond Husby & Olga Ivanova & Mark Thissen, 2018. "Simulating the Joint Distribution of Individuals, Households and Dwellings in Small Areas," International Journal of Microsimulation, International Microsimulation Association, vol. 11(2), pages 169-190.
    2. Roszka Wojciech, 2019. "Spatial Microsimulation Of Personal Income In Poland At The Level Of Subregions," Statistics in Transition New Series, Polish Statistical Association, vol. 20(3), pages 133-153, September.
    3. Nikos Tzavidis & Li‐Chun Zhang & Angela Luna & Timo Schmid & Natalia Rojas‐Perilla, 2018. "From start to finish: a framework for the production of small area official statistics," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 181(4), pages 927-979, October.
    4. Till Koebe & Alejandra Arias-Salazar & Timo Schmid, 2023. "Releasing survey microdata with exact cluster locations and additional privacy safeguards," Palgrave Communications, Palgrave Macmillan, vol. 10(1), pages 1-13, December.
    5. Dana R. Thomson & Lieke Kools & Warren C. Jochem, 2018. "Linking Synthetic Populations to Household Geolocations: A Demonstration in Namibia," Data, MDPI, vol. 3(3), pages 1-19, August.
    6. Wojciech Roszka, 2019. "Spatial Microsimulation Of Personal Income In Poland At The Level Of Subregions," Statistics in Transition New Series, Polish Statistical Association, vol. 20(3), pages 133-153, September.
    7. Speidel, Matthias & Drechsler, Jörg & Jolani, Shahab, 2018. "R package hmi: a convenient tool for hierarchical multiple imputation and beyond," IAB-Discussion Paper 201816, Institut für Arbeitsmarkt- und Berufsforschung (IAB), Nürnberg [Institute for Employment Research, Nuremberg, Germany].
    8. Antonio Arcos & Maria del Mar Rueda & Sara Pasadas-del-Amo, 2020. "Treating Nonresponse in Probability-Based Online Panels through Calibration: Empirical Evidence from a Survey of Political Decision-Making Procedures," Mathematics, MDPI, vol. 8(3), pages 1-16, March.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Beat Hulliger & Tobias Schoch, 2014. "Robust, distribution-free inference for income share ratios under complex sampling," AStA Advances in Statistical Analysis, Springer;German Statistical Society, vol. 98(1), pages 63-85, January.
    2. Ilari Ilmakunnas & Lauri Mäkinen, 2021. "Age Differences in Material Deprivation in Finland: How do Consensus and Prevalence-Based Weighting Approaches Change the Picture?," Social Indicators Research: An International and Interdisciplinary Journal for Quality-of-Life Measurement, Springer, vol. 154(2), pages 393-412, April.
    3. Espinoza-Delgado, José & Silber, Jacques, 2018. "Multi-dimensional poverty among adults in Central America and gender differences in the three I’s of poverty: Applying inequality sensitive poverty measures with ordinal variables," MPRA Paper 88750, University Library of Munich, Germany.
    4. Anne-Catherine Guio & David Gordon & Eric Marlier & Hector Najera & Marco Pomati, 2018. "Towards an EU measure of child deprivation," Child Indicators Research, Springer;The International Society of Child Indicators (ISCI), vol. 11(3), pages 835-860, June.
    5. Olivier Bargain & Tim Callan, 2010. "Analysing the effects of tax-benefit reforms on income distribution: a decomposition approach," The Journal of Economic Inequality, Springer;Society for the Study of Economic Inequality, vol. 8(1), pages 1-21, March.
    6. Frick, Joachim R. & Grabka, Markus M. & Smeeding, Timothy M. & Tsakloglou, Panos, 2010. "Distributional Effects of Imputed Rents in Five European Countries," EconStor Open Access Articles and Book Chapters, ZBW - Leibniz Information Centre for Economics, vol. 19(3), pages 167-179.
    7. Christos Koutsampelas & Panos Tsakloglou, 2013. "The distribution of full income in Greece," International Journal of Social Economics, Emerald Group Publishing Limited, vol. 40(4), pages 311-330, March.
    8. Bruce D. Meyer & James X. Sullivan, 2012. "Identifying the Disadvantaged: Official Poverty, Consumption Poverty, and the New Supplemental Poverty Measure," Journal of Economic Perspectives, American Economic Association, vol. 26(3), pages 111-136, Summer.
    9. Duclos, Jean-Yves & Araar, Abdelkrim & Giles, John, 2010. "Chronic and transient poverty: Measurement and estimation, with evidence from China," Journal of Development Economics, Elsevier, vol. 91(2), pages 266-277, March.
    10. Manos Matsaganis & Chrysa Leventi, 2011. "The distributional impact of the crisis in Greece," DEOS Working Papers 1124, Athens University of Economics and Business.
    11. Maite Blázquez & Elena Cottini & Ainhoa Herrarte, 2014. "The socioeconomic gradient in health: how important is material deprivation?," The Journal of Economic Inequality, Springer;Society for the Study of Economic Inequality, vol. 12(2), pages 239-264, June.
    12. Frank Cowell & Udo Ebert, 2004. "Complaints and inequality," Social Choice and Welfare, Springer;The Society for Social Choice and Welfare, vol. 23(1), pages 71-89, August.
    13. Rolf Aaberge & Audun Langørgen & Petter Lindgren, 2013. "The distributional impact of public services in," Discussion Papers 746, Statistics Norway, Research Department.
    14. Marco Grasso & Luciano Canova, 2008. "An Assessment of the Quality of Life in the European Union Based on the Social Indicators Approach," Social Indicators Research: An International and Interdisciplinary Journal for Quality-of-Life Measurement, Springer, vol. 87(1), pages 1-25, May.
    15. Timothy Smeeding, 2005. "Government Programs and Social Outcomes: The United States in Comparative Perspective," LIS Working papers 426, LIS Cross-National Data Center in Luxembourg.
    16. Di Cataldo, Marco & Rodríguez-Pose, Andrés, 2016. "What drives employment growth and social inclusion in EU regions," LSE Research Online Documents on Economics 68510, London School of Economics and Political Science, LSE Library.
    17. Menon Martina & Perali Federico & Veronesi Marcella, 2017. "“Leaving No Child Behind:” Preferences for Social Inclusion and Altruism," The B.E. Journal of Economic Analysis & Policy, De Gruyter, vol. 17(3), pages 1-19, July.
    18. Sabina Alkire & Maria Emma Santos, 2010. "Acute Multidimensional Poverty: A New Index for Developing Countries," Human Development Research Papers (2009 to present) HDRP-2010-11, Human Development Report Office (HDRO), United Nations Development Programme (UNDP).
    19. Alkire, Sabina & Santos, Maria Emma, 2014. "Measuring Acute Poverty in the Developing World: Robustness and Scope of the Multidimensional Poverty Index," World Development, Elsevier, vol. 59(C), pages 251-274.
    20. Sommarat Chantarat & Christopher Barrett, 2012. "Social network capital, economic mobility and poverty traps," The Journal of Economic Inequality, Springer;Society for the Study of Economic Inequality, vol. 10(3), pages 299-342, September.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:jss:jstsof:v:079:i10. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Christopher F. Baum (email available below). General contact details of provider: http://www.jstatsoft.org/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.