IDEAS home Printed from https://ideas.repec.org/a/spr/metron/v79y2021i1d10.1007_s40300-021-00201-0.html
   My bibliography  Save this article

Differentially private data release via statistical election to partition sequentially

Author

Listed:
  • Claire McKay Bowen

    (Urban Institute)

  • Fang Liu

    (University of Notre Dame)

  • Bingyue Su

    (University of Notre Dame)

Abstract

Differential Privacy (DP) formalizes privacy in mathematical terms and provides a robust concept for privacy protection. DIfferentially Private Data Synthesis (DIPS) techniques produce and release synthetic individual-level data in the DP framework. One key challenge to develop DIPS methods is the preservation of the statistical utility of synthetic data, especially in high-dimensional settings. We propose a new DIPS approach, STatistical Election to Partition Sequentially (STEPS) that partitions data by attributes according to their importance ranks according to either a practical or statistical importance measure. STEPS aims to achieve better original information preservation for the attributes with higher importance ranks and produce thus more useful synthetic data overall. We present an algorithm to implement the STEPS procedure and employ the privacy budget composability to ensure the overall privacy cost is controlled at the pre-specified value. We apply the STEPS procedure to both simulated data and the 2000–2012 Current Population Survey youth voter data. The results suggest STEPS can better preserve the population-level information and the original information for some analyses compared to PrivBayes, a modified Uniform histogram approach, and the flat Laplace sanitizer.

Suggested Citation

  • Claire McKay Bowen & Fang Liu & Bingyue Su, 2021. "Differentially private data release via statistical election to partition sequentially," METRON, Springer;Sapienza Università di Roma, vol. 79(1), pages 1-31, April.
  • Handle: RePEc:spr:metron:v:79:y:2021:i:1:d:10.1007_s40300-021-00201-0
    DOI: 10.1007/s40300-021-00201-0
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s40300-021-00201-0
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s40300-021-00201-0?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. John B. Holbein & D. Sunshine Hillygus, 2016. "Making Young Voters: The Impact of Preregistration on Youth Turnout," American Journal of Political Science, John Wiley & Sons, vol. 60(2), pages 364-382, April.
    2. Jerome P. Reiter, 2009. "Using Multiple Imputation to Integrate and Disseminate Confidential Microdata," International Statistical Review, International Statistical Institute, vol. 77(2), pages 179-195, August.
    3. Wasserman, Larry & Zhou, Shuheng, 2010. "A Statistical Framework for Differential Privacy," Journal of the American Statistical Association, American Statistical Association, vol. 105(489), pages 375-389.
    4. Karr, A.F. & Kohnen, C.N. & Oganian, A. & Reiter, J.P. & Sanil, A.P., 2006. "A Framework for Evaluating the Utility of Data Altered to Protect Confidentiality," The American Statistician, American Statistical Association, vol. 60, pages 224-232, August.
    5. Joshua Snoke & Gillian M. Raab & Beata Nowok & Chris Dibben & Aleksandra Slavkovic, 2018. "General and specific utility measures for synthetic data," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 181(3), pages 663-688, June.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. James Jackson & Robin Mitra & Brian Francis & Iain Dove, 2022. "Using saturated count models for user‐friendly synthesis of large confidential administrative databases," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 185(4), pages 1613-1643, October.
    2. Soumya Mukherjee & Aratrika Mustafi & Aleksandra Slavkovi'c & Lars Vilhuber, 2023. "Assessing Utility of Differential Privacy for RCTs," Papers 2309.14581, arXiv.org.
    3. John M. Abowd & Ian M. Schmutte & William Sexton & Lars Vilhuber, 2019. "Suboptimal Provision of Privacy and Statistical Accuracy When They are Public Goods," Papers 1906.09353, arXiv.org.
    4. Jennifer Oser & Marc Hooghe & Zsuzsa Bakk & Roberto Mari, 2023. "Changing citizenship norms among adolescents, 1999-2009-2016: A two-step latent class approach with measurement equivalence testing," Quality & Quantity: International Journal of Methodology, Springer, vol. 57(5), pages 4915-4933, October.
    5. Erica Espinosa & Alvaro Figueira, 2023. "On the Quality of Synthetic Generated Tabular Data," Mathematics, MDPI, vol. 11(15), pages 1-18, July.
    6. Ron S. Jarmin & John M. Abowd & Robert Ashmead & Ryan Cumings-Menon & Nathan Goldschlag & Michael B. Hawes & Sallie Ann Keller & Daniel Kifer & Philip Leclerc & Jerome P. Reiter & Rolando A. Rodrígue, 2023. "An in-depth examination of requirements for disclosure risk assessment," Proceedings of the National Academy of Sciences, Proceedings of the National Academy of Sciences, vol. 120(43), pages 2220558120-, October.
    7. Piatak Jaclyn, 2023. "Do Sociocultural Factors Drive Civic Engagement? An Examination of Political Interest and Religious Attendance," Nonprofit Policy Forum, De Gruyter, vol. 14(2), pages 185-204, April.
    8. Templ Matthias, 2015. "Quality Indicators for Statistical Disclosure Methods: A Case Study on the Structure of Earnings Survey," Journal of Official Statistics, Sciendo, vol. 31(4), pages 737-761, December.
    9. Niklas Potrafke & Felix Roesel, 2020. "Opening hours of polling stations and voter turnout: Evidence from a natural experiment," The Review of International Organizations, Springer, vol. 15(1), pages 133-163, January.
    10. Tatiana Komarova & Denis Nekipelov & Evgeny Yakovlev, 2018. "Identification, data combination, and the risk of disclosure," Quantitative Economics, Econometric Society, vol. 9(1), pages 395-440, March.
    11. Raj Chetty & John N. Friedman, 2019. "A Practical Method to Reduce Privacy Loss When Disclosing Statistics Based on Small Samples," AEA Papers and Proceedings, American Economic Association, vol. 109, pages 414-420, May.
    12. John M. Abowd & Robert Ashmead & Ryan Cumings-Menon & Simson Garfinkel & Micah Heineck & Christine Heiss & Robert Johns & Daniel Kifer & Philip Leclerc & Ashwin Machanavajjhala & Brett Moran & William, 2022. "The 2020 Census Disclosure Avoidance System TopDown Algorithm," Papers 2204.08986, arXiv.org.
    13. Katherine B. Coffman & Lucas C. Coffman & Keith M. Marzilli Ericson, 2017. "The Size of the LGBT Population and the Magnitude of Antigay Sentiment Are Substantially Underestimated," Management Science, INFORMS, vol. 63(10), pages 3168-3186, October.
    14. Ori Heffetz & Katrina Ligett, 2014. "Privacy and Data-Based Research," Journal of Economic Perspectives, American Economic Association, vol. 28(2), pages 75-98, Spring.
    15. repec:crs:wpidms:m2016-07 is not listed on IDEAS
    16. Daiho Uhm & Sunghae Jun, 2022. "Zero-Inflated Patent Data Analysis Using Generating Synthetic Samples," Future Internet, MDPI, vol. 14(7), pages 1-11, July.
    17. Joseph W. Sakshaug & Trivellore E. Raghunathan, 2014. "Generating synthetic microdata to estimate small area statistics in the American Community Survey," Statistics in Transition new series, Główny Urząd Statystyczny (Polska), vol. 15(3), pages 341-368, June.
    18. Tong Wang & Cynthia Rudin, 2022. "Causal Rule Sets for Identifying Subgroups with Enhanced Treatment Effects," INFORMS Journal on Computing, INFORMS, vol. 34(3), pages 1626-1643, May.
    19. Joshua Snoke & Gillian M. Raab & Beata Nowok & Chris Dibben & Aleksandra Slavkovic, 2018. "General and specific utility measures for synthetic data," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 181(3), pages 663-688, June.
    20. Toth Daniell, 2014. "Data Smearing: An Approach to Disclosure Limitation for Tabular Data," Journal of Official Statistics, Sciendo, vol. 30(4), pages 1-19, December.
    21. Jinshuo Dong & Aaron Roth & Weijie J. Su, 2022. "Gaussian differential privacy," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 84(1), pages 3-37, February.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:metron:v:79:y:2021:i:1:d:10.1007_s40300-021-00201-0. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.