IDEAS home Printed from https://ideas.repec.org/a/inm/ormnsc/v68y2022i4p2600-2618.html
   My bibliography  Save this article

Implications of Data Anonymization on the Statistical Evidence of Disparity

Author

Listed:
  • Heng Xu

    (Kogod School of Business, American University, Washington, District of Columbia 20016)

  • Nan Zhang

    (Kogod School of Business, American University, Washington, District of Columbia 20016)

Abstract

Research and practical development of data-anonymization techniques have proliferated in recent years. Yet, limited attention has been paid to examine the potentially disparate impact of privacy protection on underprivileged subpopulations. This study is one of the first attempts to examine the extent to which data anonymization could mask the gross statistical disparities between subpopulations in the data. We first describe two common mechanisms of data anonymization and two prevalent types of statistical evidence for disparity. Then, we develop conceptual foundation and mathematical formalism demonstrating that the two data-anonymization mechanisms have distinctive impacts on the identifiability of disparity, which also varies based on its statistical operationalization. After validating our findings with empirical evidence, we discuss the business and policy implications, highlighting the need for firms and policy makers to balance between the protection of privacy and the recognition/rectification of disparate impact.

Suggested Citation

  • Heng Xu & Nan Zhang, 2022. "Implications of Data Anonymization on the Statistical Evidence of Disparity," Management Science, INFORMS, vol. 68(4), pages 2600-2618, April.
  • Handle: RePEc:inm:ormnsc:v:68:y:2022:i:4:p:2600-2618
    DOI: 10.1287/mnsc.2021.4028
    as

    Download full text from publisher

    File URL: http://dx.doi.org/10.1287/mnsc.2021.4028
    Download Restriction: no

    File URL: https://libkey.io/10.1287/mnsc.2021.4028?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Krishnamurty Muralidhar & Dinesh Batra & Peeter J. Kirs, 1995. "Accessibility, Security, and Accuracy in Statistical Databases: The Case for the Multiplicative Fixed Data Perturbation Approach," Management Science, INFORMS, vol. 41(9), pages 1549-1564, September.
    2. Xiao-Bai Li & Sumit Sarkar, 2013. "Class-Restricted Clustering and Microperturbation for Data Privacy," Management Science, INFORMS, vol. 59(4), pages 796-812, April.
    3. Jon Kleinberg & Sendhil Mullainathan, 2019. "Simplicity Creates Inequity: Implications for Fairness, Stereotypes, and Interpretability," NBER Working Papers 25854, National Bureau of Economic Research, Inc.
    4. Phyllis A. Siegel & Donald C. Hambrick, 2005. "Pay Disparities Within Top Management Groups: Evidence of Harmful Effects on Performance of High-Technology Firms," Organization Science, INFORMS, vol. 16(3), pages 259-274, June.
    5. John M. Abowd & Ian M. Schmutte, 2019. "An Economic Analysis of Privacy Protection and Statistical Accuracy as Social Choices," American Economic Review, American Economic Association, vol. 109(1), pages 171-202, January.
    6. Luc Rocher & Julien M. Hendrickx & Yves-Alexandre de Montjoye, 2019. "Estimating the success of re-identifications in incomplete datasets using generative models," Nature Communications, Nature, vol. 10(1), pages 1-9, December.
    7. Krishnamurty Muralidhar & Rahul Parsa & Rathindra Sarathy, 1999. "A General Additive Data Perturbation Method for Database Security," Management Science, INFORMS, vol. 45(10), pages 1399-1415, October.
    8. Santos-Lozada, Alexis R & Perez-Rivera, Danilo T & Bhat, Aarti C., 2020. "How differential privacy will affect our understanding of population growth in the United States," SocArXiv pmux7, Center for Open Science.
    9. Templ, Matthias & Kowarik, Alexander & Meindl, Bernhard, 2015. "Statistical Disclosure Control for Micro-Data Using the R Package sdcMicro," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 67(i04).
    10. Alexis R. Santos-Lozada & Jeffrey T. Howard & Ashton M. Verdery, 2020. "How differential privacy will affect our understanding of health disparities in the United States," Proceedings of the National Academy of Sciences, Proceedings of the National Academy of Sciences, vol. 117(24), pages 13405-13412, June.
    11. Alessandro Acquisti & Christina Fong, 2020. "An Experiment in Hiring Discrimination via Online Social Networks," Management Science, INFORMS, vol. 66(3), pages 1005-1024, March.
    12. John, Leslie K. & Loewenstein, George & Acquisti, Alessandro & Vosgerau, Joachim, 2018. "When and why randomized response techniques (fail to) elicit the truth," Organizational Behavior and Human Decision Processes, Elsevier, vol. 148(C), pages 101-123.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Haibing Lu & Jaideep Vaidya & Vijayalakshmi Atluri & Yingjiu Li, 2015. "Statistical Database Auditing Without Query Denial Threat," INFORMS Journal on Computing, INFORMS, vol. 27(1), pages 20-34, February.
    2. Ron S. Jarmin & John M. Abowd & Robert Ashmead & Ryan Cumings-Menon & Nathan Goldschlag & Michael B. Hawes & Sallie Ann Keller & Daniel Kifer & Philip Leclerc & Jerome P. Reiter & Rolando A. Rodrígue, 2023. "An in-depth examination of requirements for disclosure risk assessment," Proceedings of the National Academy of Sciences, Proceedings of the National Academy of Sciences, vol. 120(43), pages 2220558120-, October.
    3. Sigurd Dyrting & Abraham Flaxman & Ethan Sharygin, 2022. "Reconstruction of age distributions from differentially private census data," Population Research and Policy Review, Springer;Southern Demographic Association (SDA), vol. 41(6), pages 2311-2329, December.
    4. Rathindra Sarathy & Krishnamurty Muralidhar & Rahul Parsa, 2002. "Perturbing Nonnormal Confidential Attributes: The Copula Approach," Management Science, INFORMS, vol. 48(12), pages 1613-1627, December.
    5. Rathindra Sarathy & Krishnamurty Muralidhar, 2002. "The Security of Confidential Numerical Data in Databases," Information Systems Research, INFORMS, vol. 13(4), pages 389-403, December.
    6. Rehse, Dominik & Tremöhlen, Felix, 2020. "Fostering participation in digital public health interventions: The case of digital contact tracing," ZEW Discussion Papers 20-076, ZEW - Leibniz Centre for European Economic Research.
    7. Yi Qian & Hui Xie, 2013. "Drive More Effective Data-Based Innovations: Enhancing the Utility of Secure Databases," NBER Working Papers 19586, National Bureau of Economic Research, Inc.
    8. Manuel A. Nunez & Robert S. Garfinkel & Ram D. Gopal, 2007. "Stochastic Protection of Confidential Information in Databases: A Hybrid of Data Perturbation and Query Restriction," Operations Research, INFORMS, vol. 55(5), pages 890-908, October.
    9. J. Tom Mueller & Alexis R. Santos-Lozada, 2022. "The 2020 US Census Differential Privacy Method Introduces Disproportionate Discrepancies for Rural and Non-White Populations," Population Research and Policy Review, Springer;Southern Demographic Association (SDA), vol. 41(4), pages 1417-1430, August.
    10. Chu, Amanda M.Y. & Ip, Chun Yin & Lam, Benson S.Y. & So, Mike K.P., 2022. "Vine copula statistical disclosure control for mixed-type data," Computational Statistics & Data Analysis, Elsevier, vol. 176(C).
    11. Seokho Lee & Marc G. Genton & Reinaldo B. Arellano-Valle, 2010. "Perturbation of Numerical Confidential Data via Skew-t Distributions," Management Science, INFORMS, vol. 56(2), pages 318-333, February.
    12. Steven Ruggles & David Riper, 2022. "The Role of Chance in the Census Bureau Database Reconstruction Experiment," Population Research and Policy Review, Springer;Southern Demographic Association (SDA), vol. 41(3), pages 781-788, June.
    13. Yi Qian & Hui Xie, 2015. "Drive More Effective Data-Based Innovations: Enhancing the Utility of Secure Databases," Management Science, INFORMS, vol. 61(3), pages 520-541, March.
    14. Francesco Capozza & Ingar Haaland & Christopher Roth & Johannes Wohlfart, 2021. "Studying Information Acquisition in the Field: A Practical Guide and Review," CEBI working paper series 21-15, University of Copenhagen. Department of Economics. The Center for Economic Behavior and Inequality (CEBI).
    15. Höglinger, Marc & Diekmann, Andreas, 2017. "Uncovering a Blind Spot in Sensitive Question Research: False Positives Undermine the Crosswise-Model RRT," Political Analysis, Cambridge University Press, vol. 25(1), pages 131-137, January.
    16. Xiao-Bai Li & Jialun Qin, 2017. "Anonymizing and Sharing Medical Text Records," Information Systems Research, INFORMS, vol. 28(2), pages 332-352, June.
    17. Justin J. P. Jansen & Gerard George & Frans A. J. Van den Bosch & Henk W. Volberda, 2008. "Senior Team Attributes and Organizational Ambidexterity: The Moderating Role of Transformational Leadership," Journal of Management Studies, Wiley Blackwell, vol. 45(5), pages 982-1007, July.
    18. John R. J. Thompson & Longlong Feng & R. Mark Reesor & Chuck Grace, 2021. "Know Your Clients’ Behaviours: A Cluster Analysis of Financial Transactions," JRFM, MDPI, vol. 14(2), pages 1-29, January.
    19. John M. Abowd & Ian M. Schmutte & William Sexton & Lars Vilhuber, 2019. "Suboptimal Provision of Privacy and Statistical Accuracy When They are Public Goods," Papers 1906.09353, arXiv.org.
    20. Aksoy, Billur & Chadd, Ian & Koh, Boon Han, 2023. "Sexual identity, gender, and anticipated discrimination in prosocial behavior," European Economic Review, Elsevier, vol. 154(C).

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:inm:ormnsc:v:68:y:2022:i:4:p:2600-2618. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Chris Asher (email available below). General contact details of provider: https://edirc.repec.org/data/inforea.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.