IDEAS home Printed from https://ideas.repec.org/a/bla/jorssa/v184y2021i3p1093-1108.html
   My bibliography  Save this article

Generating Poisson‐distributed differentially private synthetic data

Author

Listed:
  • Harrison Quick

Abstract

The dissemination of synthetic data can be an effective means of making information from sensitive data publicly available with a reduced risk of disclosure. While mechanisms exist for synthesizing data that satisfy formal privacy guarantees, these mechanisms do not typically resemble the models an end‐user might use to analyse the data. More recently, the use of methods from the disease mapping literature has been proposed to generate spatially referenced synthetic data with high utility but without formal privacy guarantees. The objective for this paper is to help bridge the gap between the disease mapping and the differential privacy literatures. In particular, we generalize an approach for generating differentially private synthetic data currently used by the US Census Bureau to the case of Poisson‐distributed count data in a way that accommodates heterogeneity in population sizes and allows for the infusion of prior information regarding the underlying event rates. Following a pair of small simulation studies, we illustrate the utility of the synthetic data produced by this approach using publicly available, county‐level heart disease‐related death counts. This study demonstrates the benefits of the proposed approach’s flexibility with respect to heterogeneity in population sizes and event rates while motivating further research to improve its utility.

Suggested Citation

  • Harrison Quick, 2021. "Generating Poisson‐distributed differentially private synthetic data," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 184(3), pages 1093-1108, July.
  • Handle: RePEc:bla:jorssa:v:184:y:2021:i:3:p:1093-1108
    DOI: 10.1111/rssa.12711
    as

    Download full text from publisher

    File URL: https://doi.org/10.1111/rssa.12711
    Download Restriction: no

    File URL: https://libkey.io/10.1111/rssa.12711?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Holan, Scott H. & Toth, Daniell & Ferreira, Marco A. R. & Karr, Alan F., 2010. "Bayesian Multiscale Multiple Imputation With Implications for Data Confidentiality," Journal of the American Statistical Association, American Statistical Association, vol. 105(490), pages 564-577.
    2. Daniel Manrique‐Vallier & Jingchen Hu, 2018. "Bayesian non‐parametric generation of fully synthetic multivariate categorical data in the presence of structural zeros," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 181(3), pages 635-647, June.
    3. Wasserman, Larry & Zhou, Shuheng, 2010. "A Statistical Framework for Differential Privacy," Journal of the American Statistical Association, American Statistical Association, vol. 105(489), pages 375-389.
    4. Harrison Quick & Scott H. Holan & Christopher K. Wikle, 2018. "Generating partially synthetic geocoded public use data with decreased disclosure risk by using differential smoothing," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 181(3), pages 649-661, June.
    5. Julian Besag & Jeremy York & Annie Mollié, 1991. "Bayesian image restoration, with two applications in spatial statistics," Annals of the Institute of Statistical Mathematics, Springer;The Institute of Statistical Mathematics, vol. 43(1), pages 1-20, March.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Toth Daniell, 2014. "Data Smearing: An Approach to Disclosure Limitation for Tabular Data," Journal of Official Statistics, Sciendo, vol. 30(4), pages 839-857, December.
    2. Katie Wilson & Jon Wakefield, 2022. "A probabilistic model for analyzing summary birth history data," Demographic Research, Max Planck Institute for Demographic Research, Rostock, Germany, vol. 47(11), pages 291-344.
    3. Eibich, Peter & Ziebarth, Nicolas, 2014. "Examining the Structure of Spatial Health Effects in Germany Using Hierarchical Bayes Models," EconStor Open Access Articles and Book Chapters, ZBW - Leibniz Information Centre for Economics, vol. 49, pages 305-320.
    4. John M. Abowd & Ian M. Schmutte & William Sexton & Lars Vilhuber, 2019. "Suboptimal Provision of Privacy and Statistical Accuracy When They are Public Goods," Papers 1906.09353, arXiv.org.
    5. Mayer Alvo & Jingrui Mu, 2023. "COVID-19 Data Analysis Using Bayesian Models and Nonparametric Geostatistical Models," Mathematics, MDPI, vol. 11(6), pages 1-13, March.
    6. Zhengyi Zhou & David S. Matteson & Dawn B. Woodard & Shane G. Henderson & Athanasios C. Micheas, 2015. "A Spatio-Temporal Point Process Model for Ambulance Demand," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 110(509), pages 6-15, March.
    7. Eric C. Tassone & Marie Lynn Miranda & Alan E. Gelfand, 2010. "Disaggregated spatial modelling for areal unit categorical data," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 59(1), pages 175-190, January.
    8. Junming Li & Xiulan Han & Xiao Li & Jianping Yang & Xuejiao Li, 2018. "Spatiotemporal Patterns of Ground Monitored PM 2.5 Concentrations in China in Recent Years," IJERPH, MDPI, vol. 15(1), pages 1-15, January.
    9. Massimo Bilancia & Giacomo Demarinis, 2014. "Bayesian scanning of spatial disease rates with integrated nested Laplace approximation (INLA)," Statistical Methods & Applications, Springer;Società Italiana di Statistica, vol. 23(1), pages 71-94, March.
    10. Douglas R. M. Azevedo & Marcos O. Prates & Dipankar Bandyopadhyay, 2021. "MSPOCK: Alleviating Spatial Confounding in Multivariate Disease Mapping Models," Journal of Agricultural, Biological and Environmental Statistics, Springer;The International Biometric Society;American Statistical Association, vol. 26(3), pages 464-491, September.
    11. Bondo, Kristin J. & Rosenberry, Christopher S. & Stainbrook, David & Walter, W. David, 2024. "Comparing risk of chronic wasting disease occurrence using Bayesian hierarchical spatial models and different surveillance types," Ecological Modelling, Elsevier, vol. 493(C).
    12. Jonathan Wakefield & Taylor Okonek & Jon Pedersen, 2020. "Small Area Estimation for Disease Prevalence Mapping," International Statistical Review, International Statistical Institute, vol. 88(2), pages 398-418, August.
    13. Ron S. Jarmin & John M. Abowd & Robert Ashmead & Ryan Cumings-Menon & Nathan Goldschlag & Michael B. Hawes & Sallie Ann Keller & Daniel Kifer & Philip Leclerc & Jerome P. Reiter & Rolando A. Rodrígue, 2023. "An in-depth examination of requirements for disclosure risk assessment," Proceedings of the National Academy of Sciences, Proceedings of the National Academy of Sciences, vol. 120(43), pages 2220558120-, October.
    14. Francisca Corpas-Burgos & Miguel A. Martinez-Beneito, 2021. "An Autoregressive Disease Mapping Model for Spatio-Temporal Forecasting," Mathematics, MDPI, vol. 9(4), pages 1-17, February.
    15. Li Xu & Qingshan Jiang & David R. Lairson, 2019. "Spatio-Temporal Variation of Gender-Specific Hypertension Risk: Evidence from China," IJERPH, MDPI, vol. 16(22), pages 1-26, November.
    16. Isabel Martínez-Pérez & Verónica González-Iglesias & Valentín Rodríguez Suárez & Ana Fernández-Somoano, 2021. "Spatial Distribution of Hospitalizations for Ischemic Heart Diseases in the Central Region of Asturias, Spain," IJERPH, MDPI, vol. 18(23), pages 1-10, November.
    17. Cao, Zilong & Wu, Shisong & Li, Xuanang & Zhang, Hai, 2025. "Differentially private histogram with valid statistics," Statistics & Probability Letters, Elsevier, vol. 219(C).
    18. Johnson, Blair T. & Sisti, Anthony & Bernstein, Mary & Chen, Kun & Hennessy, Emily A. & Acabchuk, Rebecca L. & Matos, Michaela, 2021. "Community-level factors and incidence of gun violence in the United States, 2014–2017," Social Science & Medicine, Elsevier, vol. 280(C).
    19. F. Corpas-Burgos & P. Botella-Rocamora & M. A. Martinez-Beneito, 2019. "On the convenience of heteroscedasticity in highly multivariate disease mapping," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 28(4), pages 1229-1250, December.
    20. Alexandra Schmidt & Ajax Moreira & Steven Helfand & Thais Fonseca, 2009. "Spatial stochastic frontier models: accounting for unobserved local determinants of inefficiency," Journal of Productivity Analysis, Springer, vol. 31(2), pages 101-112, April.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:bla:jorssa:v:184:y:2021:i:3:p:1093-1108. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Wiley Content Delivery (email available below). General contact details of provider: https://edirc.repec.org/data/rssssea.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.