IDEAS home Printed from https://ideas.repec.org/a/eee/econom/v235y2023i2p444-453.html
   My bibliography  Save this article

Distribution-invariant differential privacy

Author

Listed:
  • Bi, Xuan
  • Shen, Xiaotong

Abstract

Differential privacy is becoming one gold standard for protecting the privacy of publicly shared data. It has been widely used in social science, data science, public health, information technology, and the U.S. decennial census. Nevertheless, to guarantee differential privacy, existing methods may unavoidably alter the conclusion of original data analysis, as privatization often changes the sample distribution. This phenomenon is known as the trade-off between privacy protection and statistical accuracy. In this work, we mitigate this trade-off by developing a distribution-invariant privatization (DIP) method to reconcile both high statistical accuracy and strict differential privacy. As a result, any downstream statistical or machine learning task yields essentially the same conclusion as if one used the original data. Numerically, under the same strictness of privacy protection, DIP achieves superior statistical accuracy in in a wide range of simulation studies and real-world benchmarks.

Suggested Citation

  • Bi, Xuan & Shen, Xiaotong, 2023. "Distribution-invariant differential privacy," Journal of Econometrics, Elsevier, vol. 235(2), pages 444-453.
  • Handle: RePEc:eee:econom:v:235:y:2023:i:2:p:444-453
    DOI: 10.1016/j.jeconom.2022.05.004
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S030440762200121X
    Download Restriction: Full text for ScienceDirect subscribers only

    File URL: https://libkey.io/10.1016/j.jeconom.2022.05.004?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Benjamin Haibe-Kains & George Alexandru Adam & Ahmed Hosny & Farnoosh Khodakarami & Levi Waldron & Bo Wang & Chris McIntosh & Anna Goldenberg & Anshul Kundaje & Casey S. Greene & Tamara Broderick & Mi, 2020. "Transparency and reproducibility in artificial intelligence," Nature, Nature, vol. 586(7829), pages 14-16, October.
    2. John C. Duchi & Michael I. Jordan & Martin J. Wainwright, 2018. "Minimax Optimal Procedures for Locally Private Estimation," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 113(521), pages 182-201, January.
    3. Marco Avella-Medina, 2021. "Privacy-Preserving Parametric Inference: A Case for Robust Statistics," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 116(534), pages 969-983, April.
    4. Srinivasan Venkatramanan & Adam Sadilek & Arindam Fadikar & Christopher L. Barrett & Matthew Biggerstaff & Jiangzhuo Chen & Xerxes Dotiwalla & Paul Eastham & Bryant Gipson & Dave Higdon & Onur Kucuktu, 2021. "Forecasting influenza activity using machine-learned mobility map," Nature Communications, Nature, vol. 12(1), pages 1-12, December.
    5. Alexis R. Santos-Lozada & Jeffrey T. Howard & Ashton M. Verdery, 2020. "How differential privacy will affect our understanding of health disparities in the United States," Proceedings of the National Academy of Sciences, Proceedings of the National Academy of Sciences, vol. 117(24), pages 13405-13412, June.
    6. Wasserman, Larry & Zhou, Shuheng, 2010. "A Statistical Framework for Differential Privacy," Journal of the American Statistical Association, American Statistical Association, vol. 105(489), pages 375-389.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Jinshuo Dong & Aaron Roth & Weijie J. Su, 2022. "Gaussian differential privacy," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 84(1), pages 3-37, February.
    2. John M. Abowd & Ian M. Schmutte & William Sexton & Lars Vilhuber, 2019. "Suboptimal Provision of Privacy and Statistical Accuracy When They are Public Goods," Papers 1906.09353, arXiv.org.
    3. Kroll, Martin, 2022. "On the universal consistency of histograms anonymised by a randomised response technique," Statistics & Probability Letters, Elsevier, vol. 185(C).
    4. Ron S. Jarmin & John M. Abowd & Robert Ashmead & Ryan Cumings-Menon & Nathan Goldschlag & Michael B. Hawes & Sallie Ann Keller & Daniel Kifer & Philip Leclerc & Jerome P. Reiter & Rolando A. Rodrígue, 2023. "An in-depth examination of requirements for disclosure risk assessment," Proceedings of the National Academy of Sciences, Proceedings of the National Academy of Sciences, vol. 120(43), pages 2220558120-, October.
    5. Sigurd Dyrting & Abraham Flaxman & Ethan Sharygin, 2022. "Reconstruction of age distributions from differentially private census data," Population Research and Policy Review, Springer;Southern Demographic Association (SDA), vol. 41(6), pages 2311-2329, December.
    6. Raj Chetty & John N. Friedman, 2019. "A Practical Method to Reduce Privacy Loss When Disclosing Statistics Based on Small Samples," AEA Papers and Proceedings, American Economic Association, vol. 109, pages 414-420, May.
    7. John M. Abowd & Robert Ashmead & Ryan Cumings-Menon & Simson Garfinkel & Micah Heineck & Christine Heiss & Robert Johns & Daniel Kifer & Philip Leclerc & Ashwin Machanavajjhala & Brett Moran & William, 2022. "The 2020 Census Disclosure Avoidance System TopDown Algorithm," Papers 2204.08986, arXiv.org.
    8. Matthew J. Schneider & Dawn Iacobucci, 2020. "Protecting survey data on a consumer level," Journal of Marketing Analytics, Palgrave Macmillan, vol. 8(1), pages 3-17, March.
    9. Igna, Ioana & Venturini, Francesco, 2023. "The determinants of AI innovation across European firms," Research Policy, Elsevier, vol. 52(2).
    10. Ori Heffetz & Katrina Ligett, 2014. "Privacy and Data-Based Research," Journal of Economic Perspectives, American Economic Association, vol. 28(2), pages 75-98, Spring.
    11. Toth Daniell, 2014. "Data Smearing: An Approach to Disclosure Limitation for Tabular Data," Journal of Official Statistics, Sciendo, vol. 30(4), pages 1-19, December.
    12. Rostami-Tabar, Bahman & Ali, Mohammad M. & Hong, Tao & Hyndman, Rob J. & Porter, Michael D. & Syntetos, Aris, 2022. "Forecasting for social good," International Journal of Forecasting, Elsevier, vol. 38(3), pages 1245-1257.
    13. Soumya Mukherjee & Aratrika Mustafi & Aleksandra Slavkovi'c & Lars Vilhuber, 2023. "Assessing Utility of Differential Privacy for RCTs," Papers 2309.14581, arXiv.org.
    14. Katherine B. Coffman & Lucas C. Coffman & Keith M. Marzilli Ericson, 2017. "The Size of the LGBT Population and the Magnitude of Antigay Sentiment Are Substantially Underestimated," Management Science, INFORMS, vol. 63(10), pages 3168-3186, October.
    15. Chongliang Luo & Md. Nazmul Islam & Natalie E. Sheils & John Buresh & Jenna Reps & Martijn J. Schuemie & Patrick B. Ryan & Mackenzie Edmondson & Rui Duan & Jiayi Tong & Arielle Marks-Anglin & Jiang Bi, 2022. "DLMM as a lossless one-shot algorithm for collaborative multi-site distributed linear mixed models," Nature Communications, Nature, vol. 13(1), pages 1-10, December.
    16. Heng Xu & Nan Zhang, 2022. "Implications of Data Anonymization on the Statistical Evidence of Disparity," Management Science, INFORMS, vol. 68(4), pages 2600-2618, April.
    17. Juan Carlos Henao & Liliana López-Jiménez, 2021. "Disrupción tecnológica, transformación digital y sociedad. Tomo IV, Aires de revolución : nuevos desafíos tecnológicos a las instituciones económicas, financieras y organizacionales de nuestros tiempo," Books, Universidad Externado de Colombia, Facultad de Derecho, number 1283, October.
    18. Mathieu Chevrier & Brice Corgnet & Eric Guerci & Julie Rosaz, 2024. "Algorithm Credulity: Human and Algorithmic Advice in Prediction Experiments," GREDEG Working Papers 2024-03, Groupe de REcherche en Droit, Economie, Gestion (GREDEG CNRS), Université Côte d'Azur, France.
    19. Lalanne, Clément & Gadat, Sébastien, 2024. "Privately Learning Smooth Distributions on the Hypercube by Projections," TSE Working Papers 24-1505, Toulouse School of Economics (TSE).
    20. Plantinga, Paul, 2022. "Digital discretion and public administration in Africa: Implications for the use of artificial intelligence," SocArXiv 2r98w, Center for Open Science.

    More about this item

    Keywords

    Privacy protection; Distribution preservation; Data sharing; Data perturbation; Randomized mechanism;
    All these keywords.

    JEL classification:

    • C10 - Mathematical and Quantitative Methods - - Econometric and Statistical Methods and Methodology: General - - - General

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:econom:v:235:y:2023:i:2:p:444-453. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.elsevier.com/locate/jeconom .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.