IDEAS home Printed from https://ideas.repec.org/a/bla/jorssa/v184y2021i3p1093-1108.html
   My bibliography  Save this article

Generating Poisson‐distributed differentially private synthetic data

Author

Listed:
  • Harrison Quick

Abstract

The dissemination of synthetic data can be an effective means of making information from sensitive data publicly available with a reduced risk of disclosure. While mechanisms exist for synthesizing data that satisfy formal privacy guarantees, these mechanisms do not typically resemble the models an end‐user might use to analyse the data. More recently, the use of methods from the disease mapping literature has been proposed to generate spatially referenced synthetic data with high utility but without formal privacy guarantees. The objective for this paper is to help bridge the gap between the disease mapping and the differential privacy literatures. In particular, we generalize an approach for generating differentially private synthetic data currently used by the US Census Bureau to the case of Poisson‐distributed count data in a way that accommodates heterogeneity in population sizes and allows for the infusion of prior information regarding the underlying event rates. Following a pair of small simulation studies, we illustrate the utility of the synthetic data produced by this approach using publicly available, county‐level heart disease‐related death counts. This study demonstrates the benefits of the proposed approach’s flexibility with respect to heterogeneity in population sizes and event rates while motivating further research to improve its utility.

Suggested Citation

  • Harrison Quick, 2021. "Generating Poisson‐distributed differentially private synthetic data," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 184(3), pages 1093-1108, July.
  • Handle: RePEc:bla:jorssa:v:184:y:2021:i:3:p:1093-1108
    DOI: 10.1111/rssa.12711
    as

    Download full text from publisher

    File URL: https://doi.org/10.1111/rssa.12711
    Download Restriction: no

    File URL: https://libkey.io/10.1111/rssa.12711?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Harrison Quick & Scott H. Holan & Christopher K. Wikle, 2018. "Generating partially synthetic geocoded public use data with decreased disclosure risk by using differential smoothing," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 181(3), pages 649-661, June.
    2. Daniel Manrique‐Vallier & Jingchen Hu, 2018. "Bayesian non‐parametric generation of fully synthetic multivariate categorical data in the presence of structural zeros," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 181(3), pages 635-647, June.
    3. Holan, Scott H. & Toth, Daniell & Ferreira, Marco A. R. & Karr, Alan F., 2010. "Bayesian Multiscale Multiple Imputation With Implications for Data Confidentiality," Journal of the American Statistical Association, American Statistical Association, vol. 105(490), pages 564-577.
    4. Wasserman, Larry & Zhou, Shuheng, 2010. "A Statistical Framework for Differential Privacy," Journal of the American Statistical Association, American Statistical Association, vol. 105(489), pages 375-389.
    5. Julian Besag & Jeremy York & Annie Mollié, 1991. "Bayesian image restoration, with two applications in spatial statistics," Annals of the Institute of Statistical Mathematics, Springer;The Institute of Statistical Mathematics, vol. 43(1), pages 1-20, March.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Toth Daniell, 2014. "Data Smearing: An Approach to Disclosure Limitation for Tabular Data," Journal of Official Statistics, Sciendo, vol. 30(4), pages 839-857, December.
    2. Vinícius Diniz Mayrink & Renato Valladares Panaro & Marcelo Azevedo Costa, 2021. "Structural equation modeling with time dependence: an application comparing Brazilian energy distributors," AStA Advances in Statistical Analysis, Springer;German Statistical Society, vol. 105(2), pages 353-383, June.
    3. Katie Wilson & Jon Wakefield, 2022. "A probabilistic model for analyzing summary birth history data," Demographic Research, Max Planck Institute for Demographic Research, Rostock, Germany, vol. 47(11), pages 291-344.
    4. Thomas C. McHale & Claudia M. Romero-Vivas & Claudio Fronterre & Pedro Arango-Padilla & Naomi R. Waterlow & Chad D. Nix & Andrew K. Falconar & Jorge Cano, 2019. "Spatiotemporal Heterogeneity in the Distribution of Chikungunya and Zika Virus Case Incidences during their 2014 to 2016 Epidemics in Barranquilla, Colombia," IJERPH, MDPI, vol. 16(10), pages 1-21, May.
    5. Peter Congdon, 2010. "A multiple indicator, multiple cause method for representing social capital with an application to psychological distress," Journal of Geographical Systems, Springer, vol. 12(1), pages 1-23, March.
    6. Renato Assunção & Carl Schmertmann & Joseph Potter & Suzana Cavenaghi, 2005. "Empirical bayes estimation of demographic schedules for small areas," Demography, Springer;Population Association of America (PAA), vol. 42(3), pages 537-558, August.
    7. Peter Congdon, 2014. "Estimating life expectancies for US small areas: a regression framework," Journal of Geographical Systems, Springer, vol. 16(1), pages 1-18, January.
    8. Shota Homma & Daisuke Murakami & Shinya Hosokawa & Koji Kanefuji, 2025. "Introduction risk of fire ants through container cargo in ports: Data integration approach considering a logistic network," PLOS ONE, Public Library of Science, vol. 20(2), pages 1-15, February.
    9. Eibich, Peter & Ziebarth, Nicolas, 2014. "Examining the Structure of Spatial Health Effects in Germany Using Hierarchical Bayes Models," EconStor Open Access Articles and Book Chapters, ZBW - Leibniz Information Centre for Economics, vol. 49, pages 305-320.
    10. Chen, Yewen & Chang, Xiaohui & Luo, Fangzhi & Huang, Hui, 2023. "Additive dynamic models for correcting numerical model outputs," Computational Statistics & Data Analysis, Elsevier, vol. 187(C).
    11. John M. Abowd & Ian M. Schmutte & William Sexton & Lars Vilhuber, 2019. "Suboptimal Provision of Privacy and Statistical Accuracy When They are Public Goods," Papers 1906.09353, arXiv.org.
    12. Dani Gamerman & Ajax R. B. Moreira, 2015. "Multivariate Spatial Regression Models," Discussion Papers 0116, Instituto de Pesquisa Econômica Aplicada - IPEA.
    13. Jamie M. Madden & Simon More & Conor Teljeur & Justin Gleeson & Cathal Walsh & Guy McGrath, 2021. "Population Mobility Trends, Deprivation Index and the Spatio-Temporal Spread of Coronavirus Disease 2019 in Ireland," IJERPH, MDPI, vol. 18(12), pages 1-16, June.
    14. Peter Congdon, 2020. "Geographical Aspects of Recent Trends in Drug-Related Deaths, with a Focus on Intra-National Contextual Variation," IJERPH, MDPI, vol. 17(21), pages 1-18, November.
    15. Maciej Beręsewicz & Dagmara Nikulin, 2018. "Informal employment in Poland: an empirical spatial analysis," Spatial Economic Analysis, Taylor & Francis Journals, vol. 13(3), pages 338-355, July.
    16. Zhu, Dongping & Huang, Xiaogang & Ding, Zhixia & Zhang, Wei, 2024. "Estimation of wind turbine responses with attention-based neural network incorporating environmental uncertainties," Reliability Engineering and System Safety, Elsevier, vol. 241(C).
    17. Miriam Marco & Enrique Gracia & Antonio López-Quílez & Marisol Lila, 2021. "The Spatial Overlap of Police Calls Reporting Street-Level and Behind-Closed-Doors Crime: A Bayesian Modeling Approach," IJERPH, MDPI, vol. 18(10), pages 1-14, May.
    18. Shreosi Sanyal & Thierry Rochereau & Cara Nichole Maesano & Laure Com-Ruelle & Isabella Annesi-Maesano, 2018. "Long-Term Effect of Outdoor Air Pollution on Mortality and Morbidity: A 12-Year Follow-Up Study for Metropolitan France," IJERPH, MDPI, vol. 15(11), pages 1-8, November.
    19. Mayer Alvo & Jingrui Mu, 2023. "COVID-19 Data Analysis Using Bayesian Models and Nonparametric Geostatistical Models," Mathematics, MDPI, vol. 11(6), pages 1-13, March.
    20. Ying C. MacNab, 2018. "Rejoinder on: Some recent work on multivariate Gaussian Markov random fields," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 27(3), pages 554-569, September.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:bla:jorssa:v:184:y:2021:i:3:p:1093-1108. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Wiley Content Delivery (email available below). General contact details of provider: https://edirc.repec.org/data/rssssea.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.