IDEAS home Printed from https://ideas.repec.org/a/inm/orisre/v22y2011i4p774-789.html
   My bibliography  Save this article

Protecting Privacy Against Record Linkage Disclosure: A Bounded Swapping Approach for Numeric Data

Author

Listed:
  • Xiao-Bai Li

    (Department of Operations and Information Systems, University of Massachusetts Lowell, Lowell, Massachusetts 01854)

  • Sumit Sarkar

    (School of Management, University of Texas at Dallas, Richardson, Texas 75080)

Abstract

Record linkage techniques have been widely used in areas such as antiterrorism, crime analysis, epidemiologic research, and database marketing. On the other hand, such techniques are also being increasingly used for identity matching that leads to the disclosure of private information. These techniques can be used to effectively reidentify records even in deidentified data. Consequently, the use of such techniques can lead to individual privacy being severely eroded. Our study addresses this important issue and provides a solution to resolve the conflict between privacy protection and data utility. We propose a data-masking method for protecting private information against record linkage disclosure that preserves the statistical properties of the data for legitimate analysis. Our method recursively partitions a data set into smaller subsets such that data records within each subset are more homogeneous after each partition. The partition is made orthogonal to the maximum variance dimension represented by the first principal component in each partitioned set. The attribute values of a record in a subset are then masked using a double-bounded swapping method. The proposed method, which we call multivariate swapping trees , is nonparametric in nature and does not require any assumptions about statistical distributions of the original data. Experiments conducted on real-world data sets demonstrate that the proposed approach significantly outperforms existing methods in terms of both preventing identity disclosure and preserving data quality.

Suggested Citation

  • Xiao-Bai Li & Sumit Sarkar, 2011. "Protecting Privacy Against Record Linkage Disclosure: A Bounded Swapping Approach for Numeric Data," Information Systems Research, INFORMS, vol. 22(4), pages 774-789, December.
  • Handle: RePEc:inm:orisre:v:22:y:2011:i:4:p:774-789
    DOI: 10.1287/isre.1100.0289
    as

    Download full text from publisher

    File URL: http://dx.doi.org/10.1287/isre.1100.0289
    Download Restriction: no

    File URL: https://libkey.io/10.1287/isre.1100.0289?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. P. S. Bradley & Usama M. Fayyad & O. L. Mangasarian, 1999. "Mathematical Programming for Data Mining: Formulations and Challenges," INFORMS Journal on Computing, INFORMS, vol. 11(3), pages 217-238, August.
    2. Robert Garfinkel & Ram Gopal & Steven Thompson, 2007. "Releasing Individually Identifiable Microdata with Privacy Protection Against Stochastic Threat: An Application to Health Information," Information Systems Research, INFORMS, vol. 18(1), pages 23-41, March.
    3. Xiao-Bai Li & Sumit Sarkar, 2006. "Privacy Protection in Data Mining: A Perturbation Approach for Categorical Data," Information Systems Research, INFORMS, vol. 17(3), pages 254-270, September.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Xiao-Bai Li & Jialun Qin, 2017. "Anonymizing and Sharing Medical Text Records," Information Systems Research, INFORMS, vol. 28(2), pages 332-352, June.
    2. Damangir, Sina & Du, Rex Yuxing & Hu, Ye, 2018. "Uncovering Patterns of Product Co-consideration: A Case Study of Online Vehicle Price Quote Request Data," Journal of Interactive Marketing, Elsevier, vol. 42(C), pages 1-17.
    3. Yi Qian & Hui Xie, 2013. "Drive More Effective Data-Based Innovations: Enhancing the Utility of Secure Databases," NBER Working Papers 19586, National Bureau of Economic Research, Inc.
    4. Shaobo Li & Matthew J. Schneider & Yan Yu & Sachin Gupta, 2023. "Reidentification Risk in Panel Data: Protecting for k -Anonymity," Information Systems Research, INFORMS, vol. 34(3), pages 1066-1088, September.
    5. Fan Zhou & Kunpeng Zhang & Shuying Xie & Xucheng Luo, 2020. "Learning to Correlate Accounts Across Online Social Networks: An Embedding-Based Approach," INFORMS Journal on Computing, INFORMS, vol. 32(3), pages 714-729, July.
    6. Morlok, Tina & Matt, Christian & Hess, Thomas, 2017. "Privatheitsforschung in den Wirtschaftswissenschaften: Entwicklung, Stand und Perspektiven," Working Papers 1/2017, University of Munich, Munich School of Management, Institute for Information Systems and New Media.
    7. Yi Qian & Hui Xie, 2015. "Drive More Effective Data-Based Innovations: Enhancing the Utility of Secure Databases," Management Science, INFORMS, vol. 61(3), pages 520-541, March.
    8. Haibing Lu & Jaideep Vaidya & Vijayalakshmi Atluri & Yingjiu Li, 2015. "Statistical Database Auditing Without Query Denial Threat," INFORMS Journal on Computing, INFORMS, vol. 27(1), pages 20-34, February.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Weiyin Hong & Frank K. Y. Chan & James Y. L. Thong, 2021. "Drivers and Inhibitors of Internet Privacy Concern: A Multidimensional Development Theory Perspective," Journal of Business Ethics, Springer, vol. 168(3), pages 539-564, January.
    2. Xiao-Bai Li & Sumit Sarkar, 2013. "Class-Restricted Clustering and Microperturbation for Data Privacy," Management Science, INFORMS, vol. 59(4), pages 796-812, April.
    3. Nigel Melville & Michael McQuaid, 2012. "Research Note ---Generating Shareable Statistical Databases for Business Value: Multiple Imputation with Multimodal Perturbation," Information Systems Research, INFORMS, vol. 23(2), pages 559-574, June.
    4. Xiao-Bai Li & Jialun Qin, 2017. "Anonymizing and Sharing Medical Text Records," Information Systems Research, INFORMS, vol. 28(2), pages 332-352, June.
    5. Aardal, Karen & van den Berg, Pieter L. & Gijswijt, Dion & Li, Shanfei, 2015. "Approximation algorithms for hard capacitated k-facility location problems," European Journal of Operational Research, Elsevier, vol. 242(2), pages 358-368.
    6. Yonghua Ji & Subodha Kumar & Vijay Mookerjee, 2016. "When Being Hot Is Not Cool: Monitoring Hot Lists for Information Security," Information Systems Research, INFORMS, vol. 27(4), pages 897-918, December.
    7. Brandner, Hubertus & Lessmann, Stefan & Voß, Stefan, 2013. "A memetic approach to construct transductive discrete support vector machines," European Journal of Operational Research, Elsevier, vol. 230(3), pages 581-595.
    8. W. Art Chaovalitwongse & Ya-Ju Fan & Rajesh C. Sachdeo, 2008. "Novel Optimization Models for Abnormal Brain Activity Classification," Operations Research, INFORMS, vol. 56(6), pages 1450-1460, December.
    9. Saïd Hanafi & Nicola Yanev, 2011. "Tabu search approaches for solving the two-group classification problem," Annals of Operations Research, Springer, vol. 183(1), pages 25-46, March.
    10. Heydari Majeed & Yousefli Amir, 2017. "A new optimization model for market basket analysis with allocation considerations: A genetic algorithm solution approach," Management & Marketing, Sciendo, vol. 12(1), pages 1-11, March.
    11. Xiao-Bai Li & James Sweigart & James Teng & Joan Donohue & Lori Thombs, 2001. "A Dynamic Programming Based Pruning Method for Decision Trees," INFORMS Journal on Computing, INFORMS, vol. 13(4), pages 332-344, November.
    12. Shaobo Li & Matthew J. Schneider & Yan Yu & Sachin Gupta, 2023. "Reidentification Risk in Panel Data: Protecting for k -Anonymity," Information Systems Research, INFORMS, vol. 34(3), pages 1066-1088, September.
    13. Saglam, Burcu & Salman, F. Sibel & Sayin, Serpil & Turkay, Metin, 2006. "A mixed-integer programming approach to the clustering problem with an application in customer segmentation," European Journal of Operational Research, Elsevier, vol. 173(3), pages 866-879, September.
    14. Matthew J. Schneider & Shawn Mankad, 2021. "A Two-Stage Authorship Attribution Method Using Text and Structured Data for De-Anonymizing User-Generated Content," Customer Needs and Solutions, Springer;Institute for Sustainable Innovation and Growth (iSIG), vol. 8(3), pages 66-83, September.
    15. Calvino, José J. & López-Haro, Miguel & Muñoz-Ocaña, Juan M. & Puerto, Justo & Rodríguez-Chía, Antonio M., 2022. "Segmentation of scanning-transmission electron microscopy images using the ordered median problem," European Journal of Operational Research, Elsevier, vol. 302(2), pages 671-687.
    16. Boginski, Vladimir & Butenko, Sergiy & Pardalos, Panos M., 2005. "Statistical analysis of financial networks," Computational Statistics & Data Analysis, Elsevier, vol. 48(2), pages 431-443, February.
    17. Yun-Bin Zhao & Zhi-Quan Luo, 2017. "Constructing New Weighted ℓ 1 -Algorithms for the Sparsest Points of Polyhedral Sets," Mathematics of Operations Research, INFORMS, vol. 42(1), pages 57-76, January.
    18. B Baesens & C Mues & D Martens & J Vanthienen, 2009. "50 years of data mining and OR: upcoming trends and challenges," Journal of the Operational Research Society, Palgrave Macmillan;The OR Society, vol. 60(1), pages 16-23, May.
    19. Qiu-Hong Wang & Kai-Lung Hui, 2017. "Technology Mergers and Acquisitions in the Presence of an Installed Base: A Strategic Analysis," Information Systems Research, INFORMS, vol. 28(1), pages 46-63, March.
    20. Zike Cao & Kai-Lung Hui & Hong Xu, 2018. "An Economic Analysis of Peer Disclosure in Online Social Communities," Information Systems Research, INFORMS, vol. 29(3), pages 546-566, September.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:inm:orisre:v:22:y:2011:i:4:p:774-789. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Chris Asher (email available below). General contact details of provider: https://edirc.repec.org/data/inforea.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.