IDEAS home Printed from https://ideas.repec.org/a/pal/palcom/v10y2023i1d10.1057_s41599-023-01694-y.html
   My bibliography  Save this article

Releasing survey microdata with exact cluster locations and additional privacy safeguards

Author

Listed:
  • Till Koebe

    (Saarland University)

  • Alejandra Arias-Salazar

    (University of Costa Rica)

  • Timo Schmid

    (Otto Friedrich University Bamberg)

Abstract

Household survey programs around the world publish fine-granular georeferenced microdata to support research on the interdependence of human livelihoods and their surrounding environment. To safeguard the respondents’ privacy, micro-level survey data is usually (pseudo)-anonymized through deletion or perturbation procedures such as obfuscating the true location of data collection. This, however, poses a challenge to emerging approaches that augment survey data with auxiliary information on a local level. Here, we propose an alternative microdata dissemination strategy that leverages the utility of the original microdata with additional privacy safeguards through synthetically generated data using generative models. We back our proposal with experiments using data from the 2011 Costa Rican census and satellite-derived auxiliary information. Our strategy reduces the respondents’ re-identification risk for any number of disclosed attributes by 60–80% even under re-identification attempts.

Suggested Citation

  • Till Koebe & Alejandra Arias-Salazar & Timo Schmid, 2023. "Releasing survey microdata with exact cluster locations and additional privacy safeguards," Palgrave Communications, Palgrave Macmillan, vol. 10(1), pages 1-13, December.
  • Handle: RePEc:pal:palcom:v:10:y:2023:i:1:d:10.1057_s41599-023-01694-y
    DOI: 10.1057/s41599-023-01694-y
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1057/s41599-023-01694-y
    File Function: Abstract
    Download Restriction: Access to full text is restricted to subscribers.

    File URL: https://libkey.io/10.1057/s41599-023-01694-y?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Sophie Mitra & Debra L. Brucker, 2017. "Income Poverty and Multiple Deprivations in a High-Income Country: The Case of the United States," Social Science Quarterly, Southwestern Social Science Association, vol. 98(1), pages 37-56, March.
    2. Guanghua Chi & Han Fang & Sourav Chatterjee & Joshua E. Blumenstock, 2022. "Microestimates of wealth for all low- and middle-income countries," Proceedings of the National Academy of Sciences, Proceedings of the National Academy of Sciences, vol. 119(3), pages 2113658119-, January.
    3. Joshua E. Blumenstock, 2018. "Estimating Economic Characteristics with Phone Data," AEA Papers and Proceedings, American Economic Association, vol. 108, pages 72-76, May.
    4. Emily Aiken & Suzanne Bellue & Dean Karlan & Chris Udry & Joshua E. Blumenstock, 2022. "Machine learning and phone data can improve targeting of humanitarian aid," Nature, Nature, vol. 603(7903), pages 864-870, March.
    5. Timo Schmid & Fabian Bruckschen & Nicola Salvati & Till Zbiranski, 2017. "Constructing sociodemographic indicators for national statistical institutes by using mobile phone data: estimating literacy rates in Senegal," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 180(4), pages 1163-1190, October.
    6. Eszter Bokányi & Dániel Kondor & László Dobos & Tamás Sebők & József Stéger & István Csabai & Gábor Vattay, 2016. "Race, religion and the city: twitter word frequency patterns reveal dominant demographic dimensions in the United States," Palgrave Communications, Palgrave Macmillan, vol. 2(1), pages 1-9, December.
    7. Luc Rocher & Julien M. Hendrickx & Yves-Alexandre de Montjoye, 2019. "Estimating the success of re-identifications in incomplete datasets using generative models," Nature Communications, Nature, vol. 10(1), pages 1-9, December.
    8. Till Koebe, 2020. "Better coverage, better outcomes? Mapping mobile network data to official statistics using satellite imagery and radio propagation modelling," PLOS ONE, Public Library of Science, vol. 15(11), pages 1-28, November.
    9. Tim Janke & Mohamed Ghanmi & Florian Steinke, 2021. "Implicit Generative Copulas," Papers 2109.14567, arXiv.org, revised Nov 2021.
    10. Chad M. Topaz & Jude Higdon & Avriel Epps-Darling & Ethan Siau & Harper Kerkhoff & Shivani Mendiratta & Eric Young, 2022. "Race- and gender-based under-representation of creative contributors: art, fashion, film, and music," Palgrave Communications, Palgrave Macmillan, vol. 9(1), pages 1-11, December.
    11. Byungduk Jeong & Wonjoon Lee & Deok-Soo Kim & Hayong Shin, 2016. "Copula-Based Approach to Synthetic Population Generation," PLOS ONE, Public Library of Science, vol. 11(8), pages 1-28, August.
    12. Douglas R. Leasure & Warren C. Jochem & Eric M. Weber & Vincent Seaman & Andrew J. Tatem, 2020. "National population mapping from sparse survey data: A hierarchical Bayesian modeling framework to account for uncertainty," Proceedings of the National Academy of Sciences, Proceedings of the National Academy of Sciences, vol. 117(39), pages 24173-24179, September.
    13. Templ, Matthias & Meindl, Bernhard & Kowarik, Alexander & Dupriez, Olivier, 2017. "Simulation of Synthetic Complex Data: The R Package simPop," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 79(i10).
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Till Koebe & Alejandra Arias‐Salazar & Natalia Rojas‐Perilla & Timo Schmid, 2022. "Intercensal updating using structure‐preserving methods and satellite imagery," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 185(S2), pages 170-196, December.
    2. Aiken, Emily L. & Bedoya, Guadalupe & Blumenstock, Joshua E. & Coville, Aidan, 2023. "Program targeting with machine learning and mobile phone data: Evidence from an anti-poverty intervention in Afghanistan," Journal of Development Economics, Elsevier, vol. 161(C).
    3. Emily Aiken & Suzanne Bellue & Dean Karlan & Christopher R. Udry & Joshua Blumenstock, 2021. "Machine Learning and Mobile Phone Data Can Improve the Targeting of Humanitarian Assistance," NBER Working Papers 29070, National Bureau of Economic Research, Inc.
    4. D. Woods & A. Cunningham & C. E. Utazi & M. Bondarenko & L. Shengjie & G. E. Rogers & P. Koper & C. W. Ruktanonchai & E. zu Erbach-Schoenberg & A. J. Tatem & J. Steele & A. Sorichetta, 2022. "Exploring methods for mapping seasonal population changes using mobile phone data," Palgrave Communications, Palgrave Macmillan, vol. 9(1), pages 1-17, December.
    5. Nikos Tzavidis & Li‐Chun Zhang & Angela Luna & Timo Schmid & Natalia Rojas‐Perilla, 2018. "From start to finish: a framework for the production of small area official statistics," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 181(4), pages 927-979, October.
    6. Erlström, Andreas & Grillitsch, Markus & Hall, Ola, 2020. "The Geography of Connectivity: Trails of Mobile Phone Data," Papers in Innovation Studies 2020/6, Lund University, CIRCLE - Centre for Innovation Research.
    7. Andreas Erlström & Markus Grillitsch & Ola Hall, 2022. "The geography of connectivity: a review of mobile positioning data for economic geography," Journal of Geographical Systems, Springer, vol. 24(4), pages 679-707, October.
    8. Bijlsma Ineke & van den Brakel Jan & van der Velden Rolf & Allen Jim, 2020. "Estimating Literacy Levels at a Detailed Regional Level: an Application Using Dutch Data," Journal of Official Statistics, Sciendo, vol. 36(2), pages 251-274, June.
    9. John R. J. Thompson & Longlong Feng & R. Mark Reesor & Chuck Grace, 2021. "Know Your Clients’ Behaviours: A Cluster Analysis of Financial Transactions," JRFM, MDPI, vol. 14(2), pages 1-29, January.
    10. Patrick Krennmair & Timo Schmid, 2022. "Flexible domain prediction using mixed effects random forests," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 71(5), pages 1865-1894, November.
    11. Edward J. Oughton & Jatin Mathur, 2020. "Predicting cell phone adoption metrics using satellite imagery," Papers 2006.07311, arXiv.org, revised Jun 2021.
    12. Daniel Bjorkegren & Joshua E. Blumenstock & Samsun Knight, 2020. "Manipulation-Proof Machine Learning," Papers 2004.03865, arXiv.org.
    13. Ron S. Jarmin & John M. Abowd & Robert Ashmead & Ryan Cumings-Menon & Nathan Goldschlag & Michael B. Hawes & Sallie Ann Keller & Daniel Kifer & Philip Leclerc & Jerome P. Reiter & Rolando A. Rodrígue, 2023. "An in-depth examination of requirements for disclosure risk assessment," Proceedings of the National Academy of Sciences, Proceedings of the National Academy of Sciences, vol. 120(43), pages 2220558120-, October.
    14. Lee, Kamwoo & Braithwaite, Jeanine, 2022. "High-resolution poverty maps in Sub-Saharan Africa," World Development, Elsevier, vol. 159(C).
    15. Abay, Kibrom A. & Yonzan, Nishant & Kurdi, Sikandra & Tafere, Kibrom, 2022. "Revisiting poverty trends and the role of social protection systems in Africa during the COVID-19 pandemic," IFPRI discussion papers 2142, International Food Policy Research Institute (IFPRI).
    16. Guanghua Chi & Han Fang & Sourav Chatterjee & Joshua E. Blumenstock, 2022. "Microestimates of wealth for all low- and middle-income countries," Proceedings of the National Academy of Sciences, Proceedings of the National Academy of Sciences, vol. 119(3), pages 2113658119-, January.
    17. Aysegül Kayaoglu & Ghassan Baliki & Tilman Brück & Melodie Al Daccache & Dorothee Weiffen, 2023. "How to conduct impact evaluations in humanitarian and conflict settings," HiCN Working Papers 387, Households in Conflict Network.
    18. Oeindrila Dube & Joshua E. Blumenstock & Michael Callen & Michael J. Callen, 2022. "Measuring Religion from Behavior: Climate Shocks and Religious Adherence in Afghanistan," CESifo Working Paper Series 10114, CESifo.
    19. Francis Rathinam & Sayak Khatua & Zeba Siddiqui & Manya Malik & Pallavi Duggal & Samantha Watson & Xavier Vollenweider, 2021. "Using big data for evaluating development outcomes: A systematic map," Campbell Systematic Reviews, John Wiley & Sons, vol. 17(3), September.
    20. Mathieu J. P. Poirier & Karen A. Grépin & Michel Grignon, 2020. "Approaches and Alternatives to the Wealth Index to Measure Socioeconomic Status Using Survey Data: A Critical Interpretive Synthesis," Social Indicators Research: An International and Interdisciplinary Journal for Quality-of-Life Measurement, Springer, vol. 148(1), pages 1-46, February.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:pal:palcom:v:10:y:2023:i:1:d:10.1057_s41599-023-01694-y. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: https://www.nature.com/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.