IDEAS home Printed from https://ideas.repec.org/a/inm/ormnsc/v52y2006i10p1610-1617.html
   My bibliography  Save this article

A Data Disclosure Policy for Count Data Based on the COM-Poisson Distribution

Author

Listed:
  • Joseph B. Kadane

    (Department of Statistics, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213)

  • Ramayya Krishnan

    (The Heinz School of Public Policy and Management, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213)

  • Galit Shmueli

    (Department of Decision and Information Technologies, Smith School of Business, University of Maryland, College Park, Maryland 20742)

Abstract

Count data arise in various organizational settings. When the release of such data is sensitive, organizations need information-disclosure policies that protect data confidentiality while still providing data access. In contrast to extant disclosure policies, we describe a new policy for count tables that is based on disclosing only the sufficient statistics of a flexible discrete distribution. This distribution, the COM-Poisson, well approximates Poisson counts but also under- and over-dispersed counts. The sufficient statistics mask the exact cell counts and often also the table size. Under the scenario of a data holding agency and a data snooper, we show that this policy has low disclosure risk with no loss of data utility: Usually, many count tables correspond to the disclosed sufficient statistics. Furthermore, these count tables are equally likely to be the undisclosed table. Finding these solutions requires solving a system of linear equations, which are underdetermined for tables with more than three cells, and can be computationally prohibitive for even small tables. We also consider cell-specific interval bounds, a commonly used disclosure limitation policy, and compare them to our policy. We describe several types of snooper knowledge, their integration with the disclosed statistics, and implications. Applying this policy to three real data sets, we illustrate the low associated disclosure risk.

Suggested Citation

  • Joseph B. Kadane & Ramayya Krishnan & Galit Shmueli, 2006. "A Data Disclosure Policy for Count Data Based on the COM-Poisson Distribution," Management Science, INFORMS, vol. 52(10), pages 1610-1617, October.
  • Handle: RePEc:inm:ormnsc:v:52:y:2006:i:10:p:1610-1617
    DOI: 10.1287/mnsc.1060.0562
    as

    Download full text from publisher

    File URL: http://dx.doi.org/10.1287/mnsc.1060.0562
    Download Restriction: no

    File URL: https://libkey.io/10.1287/mnsc.1060.0562?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Sumit Dutta Chowdhury & George T. Duncan & Ramayya Krishnan & Stephen F. Roehrig & Sumitra Mukherjee, 1999. "Disclosure Detection in Multivariate Categorical Databases: Auditing Confidentiality Protection Through Two New Matrix Operators," Management Science, INFORMS, vol. 45(12), pages 1710-1723, December.
    2. Rathindra Sarathy & Krishnamurty Muralidhar, 2002. "The Security of Confidential Numerical Data in Databases," Information Systems Research, INFORMS, vol. 13(4), pages 389-403, December.
    3. Galit Shmueli & Thomas P. Minka & Joseph B. Kadane & Sharad Borle & Peter Boatwright, 2005. "A useful distribution for fitting discrete data: revival of the Conway–Maxwell–Poisson distribution," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 54(1), pages 127-142, January.
    4. Robert Garfinkel & Ram Gopal & Paulo Goes, 2002. "Privacy Protection of Binary Confidential Data Against Deterministic, Stochastic, and Insider Threat," Management Science, INFORMS, vol. 48(6), pages 749-764, June.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Kimberly F. Sellers & Andrew W. Swift & Kimberly S. Weems, 2017. "A flexible distribution class for count data," Journal of Statistical Distributions and Applications, Springer, vol. 4(1), pages 1-21, December.
    2. Xiao-Bai Li & Sumit Sarkar, 2013. "Class-Restricted Clustering and Microperturbation for Data Privacy," Management Science, INFORMS, vol. 59(4), pages 796-812, April.
    3. Sunisa Junnumtuam & Sa-Aat Niwitpong & Suparat Niwitpong, 2022. "A Zero-and-One Inflated Cosine Geometric Distribution and Its Application," Mathematics, MDPI, vol. 10(21), pages 1-22, October.
    4. Amalia R. Miller & Catherine Tucker, 2009. "Privacy Protection and Technology Diffusion: The Case of Electronic Medical Records," Management Science, INFORMS, vol. 55(7), pages 1077-1093, July.
    5. Haibing Lu & Jaideep Vaidya & Vijayalakshmi Atluri & Yingjiu Li, 2015. "Statistical Database Auditing Without Query Denial Threat," INFORMS Journal on Computing, INFORMS, vol. 27(1), pages 20-34, February.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Syam Menon & Sumit Sarkar & Shibnath Mukherjee, 2005. "Maximizing Accuracy of Shared Databases when Concealing Sensitive Patterns," Information Systems Research, INFORMS, vol. 16(3), pages 256-270, September.
    2. Syam Menon & Sumit Sarkar, 2007. "Minimizing Information Loss and Preserving Privacy," Management Science, INFORMS, vol. 53(1), pages 101-116, January.
    3. Xiao-Bai Li & Sumit Sarkar, 2006. "Privacy Protection in Data Mining: A Perturbation Approach for Categorical Data," Information Systems Research, INFORMS, vol. 17(3), pages 254-270, September.
    4. Amalia R. Miller & Catherine Tucker, 2009. "Privacy Protection and Technology Diffusion: The Case of Electronic Medical Records," Management Science, INFORMS, vol. 55(7), pages 1077-1093, July.
    5. Haibing Lu & Jaideep Vaidya & Vijayalakshmi Atluri & Yingjiu Li, 2015. "Statistical Database Auditing Without Query Denial Threat," INFORMS Journal on Computing, INFORMS, vol. 27(1), pages 20-34, February.
    6. Xiao-Bai Li & Sumit Sarkar, 2013. "Class-Restricted Clustering and Microperturbation for Data Privacy," Management Science, INFORMS, vol. 59(4), pages 796-812, April.
    7. Robert Garfinkel & Ram Gopal & Steven Thompson, 2007. "Releasing Individually Identifiable Microdata with Privacy Protection Against Stochastic Threat: An Application to Health Information," Information Systems Research, INFORMS, vol. 18(1), pages 23-41, March.
    8. Gauss Cordeiro & Josemar Rodrigues & Mário Castro, 2012. "The exponential COM-Poisson distribution," Statistical Papers, Springer, vol. 53(3), pages 653-664, August.
    9. Mevin B. Hooten & Michael R. Schwob & Devin S. Johnson & Jacob S. Ivan, 2023. "Multistage hierarchical capture–recapture models," Environmetrics, John Wiley & Sons, Ltd., vol. 34(6), September.
    10. Can Zhou & Yan Jiao & Joan Browder, 2019. "How much do we know about seabird bycatch in pelagic longline fisheries? A simulation study on the potential bias caused by the usually unobserved portion of seabird bycatch," PLOS ONE, Public Library of Science, vol. 14(8), pages 1-19, August.
    11. Darcy Steeg Morris & Kimberly F. Sellers, 2022. "A Flexible Mixed Model for Clustered Count Data," Stats, MDPI, vol. 5(1), pages 1-18, January.
    12. P. Daniel Wright & Matthew J. Liberatore & Robert L. Nydick, 2006. "A Survey of Operations Research Models and Applications in Homeland Security," Interfaces, INFORMS, vol. 36(6), pages 514-529, December.
    13. Imelda Trejo & Nicolas W Hengartner, 2022. "A modified Susceptible-Infected-Recovered model for observed under-reported incidence data," PLOS ONE, Public Library of Science, vol. 17(2), pages 1-23, February.
    14. Fernando Bonassi & Rafael Stern & Cláudia Peixoto & Sergio Wechsler, 2015. "Exchangeability and the law of maturity," Theory and Decision, Springer, vol. 78(4), pages 603-615, April.
    15. Lord, Dominique & Mannering, Fred, 2010. "The statistical analysis of crash-frequency data: A review and assessment of methodological alternatives," Transportation Research Part A: Policy and Practice, Elsevier, vol. 44(5), pages 291-305, June.
    16. Dexter Cahoy & Elvira Di Nardo & Federico Polito, 2021. "Flexible models for overdispersed and underdispersed count data," Statistical Papers, Springer, vol. 62(6), pages 2969-2990, December.
    17. Krivitsky, Pavel N., 2017. "Using contrastive divergence to seed Monte Carlo MLE for exponential-family random graph models," Computational Statistics & Data Analysis, Elsevier, vol. 107(C), pages 149-161.
    18. Robert E. Gaunt & Satish Iyengar & Adri B. Olde Daalhuis & Burcin Simsek, 2019. "An asymptotic expansion for the normalizing constant of the Conway–Maxwell–Poisson distribution," Annals of the Institute of Statistical Mathematics, Springer;The Institute of Statistical Mathematics, vol. 71(1), pages 163-180, February.
    19. Subrata Chakraborty & S. H. Ong, 2017. "Mittag - Leffler function distribution - a new generalization of hyper-Poisson distribution," Journal of Statistical Distributions and Applications, Springer, vol. 4(1), pages 1-17, December.
    20. Sellers, Kimberly F. & Raim, Andrew, 2016. "A flexible zero-inflated model to address data dispersion," Computational Statistics & Data Analysis, Elsevier, vol. 99(C), pages 68-80.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:inm:ormnsc:v:52:y:2006:i:10:p:1610-1617. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Chris Asher (email available below). General contact details of provider: https://edirc.repec.org/data/inforea.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.