IDEAS home Printed from https://ideas.repec.org/a/inm/ormnsc/v53y2007i1p101-116.html
   My bibliography  Save this article

Minimizing Information Loss and Preserving Privacy

Author

Listed:
  • Syam Menon

    (School of Management, The University of Texas at Dallas, Richardson, Texas 75083)

  • Sumit Sarkar

    (School of Management, The University of Texas at Dallas, Richardson, Texas 75083)

Abstract

The need to hide sensitive information before sharing databases has long been recognized. In the context of data mining, sensitive information often takes the form of itemsets that need to be suppressed before the data is released. This paper considers the problem of minimizing the number of nonsensitive itemsets lost while concealing sensitive ones. It is shown to be an intractably large version of an NP-hard problem. Consequently, a two-phased procedure that involves the solution of two smaller NP-hard problems is proposed as a practical and effective alternative. In the first phase, a procedure to solve a sanitization problem identifies how the support for sensitive itemsets could be eliminated from a specific transaction by removing the fewest number of items from it. This leads to a modified frequent itemset hiding problem, where transactions to be sanitized are selected such that the number of nonsensitive itemsets lost, while concealing sensitive ones, is minimized. Heuristic procedures are developed for these problems using intuition derived from their integer programming formulations. Results from computational experiments conducted on a publicly available retail data set and three large data sets generated using IBM's synthetic data generator indicate that these approaches are very effective, solving problems involving up to 10 million transactions in a short period of time. The results also show that the process of sanitization has considerable bearing on the quality of solutions obtained.

Suggested Citation

  • Syam Menon & Sumit Sarkar, 2007. "Minimizing Information Loss and Preserving Privacy," Management Science, INFORMS, vol. 53(1), pages 101-116, January.
  • Handle: RePEc:inm:ormnsc:v:53:y:2007:i:1:p:101-116
    DOI: 10.1287/mnsc.1060.0603
    as

    Download full text from publisher

    File URL: http://dx.doi.org/10.1287/mnsc.1060.0603
    Download Restriction: no

    File URL: https://libkey.io/10.1287/mnsc.1060.0603?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Sumit Dutta Chowdhury & George T. Duncan & Ramayya Krishnan & Stephen F. Roehrig & Sumitra Mukherjee, 1999. "Disclosure Detection in Multivariate Categorical Databases: Auditing Confidentiality Protection Through Two New Matrix Operators," Management Science, INFORMS, vol. 45(12), pages 1710-1723, December.
    2. Rathindra Sarathy & Krishnamurty Muralidhar, 2002. "The Security of Confidential Numerical Data in Databases," Information Systems Research, INFORMS, vol. 13(4), pages 389-403, December.
    3. Syam Menon & Sumit Sarkar & Shibnath Mukherjee, 2005. "Maximizing Accuracy of Shared Databases when Concealing Sensitive Patterns," Information Systems Research, INFORMS, vol. 16(3), pages 256-270, September.
    4. Alper Atamtürk & Martin Savelsbergh, 2005. "Integer-Programming Software Systems," Annals of Operations Research, Springer, vol. 140(1), pages 67-124, November.
    5. Robert Garfinkel & Ram Gopal & Paulo Goes, 2002. "Privacy Protection of Binary Confidential Data Against Deterministic, Stochastic, and Insider Threat," Management Science, INFORMS, vol. 48(6), pages 749-764, June.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Peng Cheng & Chun-Wei Lin & Jeng-Shyang Pan, 2015. "Use HypE to Hide Association Rules by Adding Items," PLOS ONE, Public Library of Science, vol. 10(6), pages 1-19, June.
    2. Syam Menon & Abhijeet Ghoshal & Sumit Sarkar, 2022. "Modifying Transactional Databases to Hide Sensitive Association Rules," Information Systems Research, INFORMS, vol. 33(1), pages 152-178, March.
    3. Yi Qian & Hui Xie, 2013. "Drive More Effective Data-Based Innovations: Enhancing the Utility of Secure Databases," NBER Working Papers 19586, National Bureau of Economic Research, Inc.
    4. Yiting Xing & Ling Li & Zhuming Bi & Marzena Wilamowska‐Korsak & Li Zhang, 2013. "Operations Research (OR) in Service Industries: A Comprehensive Review," Systems Research and Behavioral Science, Wiley Blackwell, vol. 30(3), pages 300-353, May.
    5. Zike Cao & Kai-Lung Hui & Hong Xu, 2018. "An Economic Analysis of Peer Disclosure in Online Social Communities," Information Systems Research, INFORMS, vol. 29(3), pages 546-566, September.
    6. Zhiqiang (Eric) Zheng & Peter Fader & Balaji Padmanabhan, 2012. "From Business Intelligence to Competitive Intelligence: Inferring Competitive Measures Using Augmented Site-Centric Data," Information Systems Research, INFORMS, vol. 23(3-part-1), pages 698-720, September.
    7. Xue Bai & Ram Gopal & Manuel Nunez & Dmitry Zhdanov, 2012. "On the Prevention of Fraud and Privacy Exposure in Process Information Flow," INFORMS Journal on Computing, INFORMS, vol. 24(3), pages 416-432, August.
    8. Abhijeet Ghoshal & Jing Hao & Syam Menon & Sumit Sarkar, 2020. "Hiding Sensitive Information when Sharing Distributed Transactional Data," Information Systems Research, INFORMS, vol. 31(2), pages 473-490, June.
    9. Yi Qian & Hui Xie, 2015. "Drive More Effective Data-Based Innovations: Enhancing the Utility of Secure Databases," Management Science, INFORMS, vol. 61(3), pages 520-541, March.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Joseph B. Kadane & Ramayya Krishnan & Galit Shmueli, 2006. "A Data Disclosure Policy for Count Data Based on the COM-Poisson Distribution," Management Science, INFORMS, vol. 52(10), pages 1610-1617, October.
    2. Syam Menon & Sumit Sarkar & Shibnath Mukherjee, 2005. "Maximizing Accuracy of Shared Databases when Concealing Sensitive Patterns," Information Systems Research, INFORMS, vol. 16(3), pages 256-270, September.
    3. Xiao-Bai Li & Sumit Sarkar, 2006. "Privacy Protection in Data Mining: A Perturbation Approach for Categorical Data," Information Systems Research, INFORMS, vol. 17(3), pages 254-270, September.
    4. Robert Garfinkel & Ram Gopal & Steven Thompson, 2007. "Releasing Individually Identifiable Microdata with Privacy Protection Against Stochastic Threat: An Application to Health Information," Information Systems Research, INFORMS, vol. 18(1), pages 23-41, March.
    5. Syam Menon & Abhijeet Ghoshal & Sumit Sarkar, 2022. "Modifying Transactional Databases to Hide Sensitive Association Rules," Information Systems Research, INFORMS, vol. 33(1), pages 152-178, March.
    6. Amalia R. Miller & Catherine Tucker, 2009. "Privacy Protection and Technology Diffusion: The Case of Electronic Medical Records," Management Science, INFORMS, vol. 55(7), pages 1077-1093, July.
    7. Haibing Lu & Jaideep Vaidya & Vijayalakshmi Atluri & Yingjiu Li, 2015. "Statistical Database Auditing Without Query Denial Threat," INFORMS Journal on Computing, INFORMS, vol. 27(1), pages 20-34, February.
    8. Xiao-Bai Li & Sumit Sarkar, 2013. "Class-Restricted Clustering and Microperturbation for Data Privacy," Management Science, INFORMS, vol. 59(4), pages 796-812, April.
    9. Xiao-Bai Li & Jialun Qin, 2017. "Anonymizing and Sharing Medical Text Records," Information Systems Research, INFORMS, vol. 28(2), pages 332-352, June.
    10. P. Daniel Wright & Matthew J. Liberatore & Robert L. Nydick, 2006. "A Survey of Operations Research Models and Applications in Homeland Security," Interfaces, INFORMS, vol. 36(6), pages 514-529, December.
    11. David R. Morrison & Jason J. Sauppe & Wenda Zhang & Sheldon H. Jacobson & Edward C. Sewell, 2017. "Cyclic best first search: Using contours to guide branch‐and‐bound algorithms," Naval Research Logistics (NRL), John Wiley & Sons, vol. 64(1), pages 64-82, February.
    12. Irfan Ertugrul & Aysegul Tus Isik, 2009. "Production Planning for a Winery With Mixed Integer Programming Model," Ege Academic Review, Ege University Faculty of Economics and Administrative Sciences, vol. 9(2), pages 375-387.
    13. Rathindra Sarathy & Krishnamurty Muralidhar, 2002. "The Security of Confidential Numerical Data in Databases," Information Systems Research, INFORMS, vol. 13(4), pages 389-403, December.
    14. Domenech, B & Lusa, A, 2016. "A MILP model for the teacher assignment problem considering teachers’ preferences," European Journal of Operational Research, Elsevier, vol. 249(3), pages 1153-1160.
    15. B. Domenech & L. Ferrer-Martí & R. Pastor, 2022. "Multicriteria analysis of renewable-based electrification projects in developing countries," Annals of Operations Research, Springer, vol. 312(2), pages 1375-1401, May.
    16. Urbinati, Andrea & Bogers, Marcel & Chiesa, Vittorio & Frattini, Federico, 2019. "Creating and capturing value from Big Data: A multiple-case study analysis of provider companies," Technovation, Elsevier, vol. 84, pages 21-36.
    17. Wang, Yihui & Tang, Tao & Ning, Bin & Meng, Lingyun, 2017. "Integrated optimization of regular train schedule and train circulation plan for urban rail transit lines," Transportation Research Part E: Logistics and Transportation Review, Elsevier, vol. 105(C), pages 83-104.
    18. Patrick Gemander & Wei-Kun Chen & Dieter Weninger & Leona Gottwald & Ambros Gleixner & Alexander Martin, 2020. "Two-row and two-column mixed-integer presolve using hashing-based pairing methods," EURO Journal on Computational Optimization, Springer;EURO - The Association of European Operational Research Societies, vol. 8(3), pages 205-240, October.
    19. Roger Rocha & Ignacio Grossmann & Marcus Poggi de Aragão, 2013. "Cascading Knapsack Inequalities: reformulation of a crude oil distribution problem," Annals of Operations Research, Springer, vol. 203(1), pages 217-234, March.
    20. Stadtler, Hartmut, 2011. "Multi-level single machine lot-sizing and scheduling with zero lead times," European Journal of Operational Research, Elsevier, vol. 209(3), pages 241-252, March.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:inm:ormnsc:v:53:y:2007:i:1:p:101-116. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Chris Asher (email available below). General contact details of provider: https://edirc.repec.org/data/inforea.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.