IDEAS home Printed from https://ideas.repec.org/a/inm/orisre/v34y2023i3p1066-1088.html
   My bibliography  Save this article

Reidentification Risk in Panel Data: Protecting for k -Anonymity

Author

Listed:
  • Shaobo Li

    (School of Business, University of Kansas, Lawrence, Kansas 66045)

  • Matthew J. Schneider

    (LeBow College of Business, Drexel University, Philadelphia, Pennsylvania 19104)

  • Yan Yu

    (Carl H. Lindner College of Business, University of Cincinnati, Cincinnati, Ohio 45221)

  • Sachin Gupta

    (SC Johnson College of Business, Cornell University, Ithaca, New York 14853)

Abstract

We consider the risk of reidentification of panelists in marketing research data that are widely used to obtain insights into buyer behavior and to develop marketing strategy. We find that 17%–94% of the panelists in 15 frequently bought consumer goods categories are subject to high risk of reidentification through a potential record linkage attack based on their unique purchasing histories even when their identities are anonymized. We first demonstrate that the risk of reidentification is vastly understated by unicity, the conventional measure. Instead, we propose a new measure of reidentification risk, termed sno-unicity, which accounts for the longitudinal nature of panel data, and show that it is much larger than unicity. To protect the privacy of panelists, we consider the well-known privacy notion of k -anonymity and develop a new approach called graph-based minimum movement k-anonymization ( k- MM) that is designed especially for panel data. The proposed k -MM approach can be formulated as an optimization problem in which the objective is to minimally distort variables in the original data based on weights that users prespecify corresponding to their use case. We further show how our approach can be extended to achieve l -diversity. We apply the k -MM approach to two different panel data sets that are widely used in marketing research. To achieve a given privacy level, compared with several benchmark protection methods, the protected data from our method result in the least distortion in inferences about key marketing metrics, such as brand market shares, share of category requirements, brand switching rates, and marketing-mix parameters estimated from a hierarchical Bayesian brand choice model.

Suggested Citation

  • Shaobo Li & Matthew J. Schneider & Yan Yu & Sachin Gupta, 2023. "Reidentification Risk in Panel Data: Protecting for k -Anonymity," Information Systems Research, INFORMS, vol. 34(3), pages 1066-1088, September.
  • Handle: RePEc:inm:orisre:v:34:y:2023:i:3:p:1066-1088
    DOI: 10.1287/isre.2022.1169
    as

    Download full text from publisher

    File URL: http://dx.doi.org/10.1287/isre.2022.1169
    Download Restriction: no

    File URL: https://libkey.io/10.1287/isre.2022.1169?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Debabrata Dey, 2003. "Record Matching in Data Warehouses: A Decision Model for Data Consolidation," Operations Research, INFORMS, vol. 51(2), pages 240-254, April.
    2. Schneider, Matthew J. & Jagpal, Sharan & Gupta, Sachin & Li, Shaobo & Yu, Yan, 2017. "Protecting customer privacy when marketing with second-party data," International Journal of Research in Marketing, Elsevier, vol. 34(3), pages 593-603.
    3. Wieringa, Jaap & Kannan, P.K. & Ma, Xiao & Reutterer, Thomas & Risselada, Hans & Skiera, Bernd, 2021. "Data analytics in a privacy-concerned world," Journal of Business Research, Elsevier, vol. 122(C), pages 915-925.
    4. Reiter, Jerome P., 2005. "Estimating Risks of Identification Disclosure in Microdata," Journal of the American Statistical Association, American Statistical Association, vol. 100, pages 1103-1112, December.
    5. David Besanko & Jean-Pierre Dubé & Sachin Gupta, 2003. "Competitive Price Discrimination Strategies in a Vertical Channel Using Aggregate Retail Data," Management Science, INFORMS, vol. 49(9), pages 1121-1138, September.
    6. Xiao-Bai Li & Sumit Sarkar, 2013. "Class-Restricted Clustering and Microperturbation for Data Privacy," Management Science, INFORMS, vol. 59(4), pages 796-812, April.
    7. David Besanko & Sachin Gupta & Dipak Jain, 1998. "Logit Demand Estimation Under Competitive Pricing Behavior: An Equilibrium Framework," Management Science, INFORMS, vol. 44(11-Part-1), pages 1533-1547, November.
    8. Kelly D. Martin & Patrick E. Murphy, 2017. "The role of data privacy in marketing," Journal of the Academy of Marketing Science, Springer, vol. 45(2), pages 135-155, March.
    9. Xiao-Bai Li & Jialun Qin, 2017. "Anonymizing and Sharing Medical Text Records," Information Systems Research, INFORMS, vol. 28(2), pages 332-352, June.
    10. Qiang Liu & Sachin Gupta & Sriram Venkataraman & Hongju Liu, 2016. "An Empirical Model of Drug Detailing: Dynamic Competition and Policy Implications," Management Science, INFORMS, vol. 62(8), pages 2321-2340, August.
    11. Matthew J. Schneider & Sharan Jagpal & Sachin Gupta & Shaobo Li & Yan Yu, 2018. "A Flexible Method for Protecting Marketing Data: An Application to Point-of-Sale Data," Marketing Science, INFORMS, vol. 37(1), pages 153-171, January.
    12. Eelco Kappe & Stefan Stremersch, 2016. "Drug Detailing and Doctors’ Prescription Decisions: The Role of Information Content in the Face of Competitive Entry," Marketing Science, INFORMS, vol. 35(6), pages 915-933, November.
    13. Debabrata Dey & Sumit Sarkar & Prabuddha De, 1998. "A Probabilistic Decision Model for Entity Matching in Heterogeneous Databases," Management Science, INFORMS, vol. 44(10), pages 1379-1395, October.
    14. Avi Goldfarb & Catherine Tucker, 2012. "Shifts in Privacy Concerns," American Economic Review, American Economic Association, vol. 102(3), pages 349-353, May.
    15. Xiao-Bai Li & Sumit Sarkar, 2006. "Privacy Protection in Data Mining: A Perturbation Approach for Categorical Data," Information Systems Research, INFORMS, vol. 17(3), pages 254-270, September.
    16. Naresh K. Malhotra & Sung S. Kim & James Agarwal, 2004. "Internet Users' Information Privacy Concerns (IUIPC): The Construct, the Scale, and a Causal Model," Information Systems Research, INFORMS, vol. 15(4), pages 336-355, December.
    17. Xiao-Bai Li & Sumit Sarkar, 2011. "Protecting Privacy Against Record Linkage Disclosure: A Bounded Swapping Approach for Numeric Data," Information Systems Research, INFORMS, vol. 22(4), pages 774-789, December.
    18. Randolph E. Bucklin & Sunil Gupta, 1999. "Commercial Use of UPC Scanner Data: Industry and Academic Perspectives," Marketing Science, INFORMS, vol. 18(3), pages 247-273.
    19. O. C. Ferrell, 2017. "Broadening marketing’s contribution to data privacy," Journal of the Academy of Marketing Science, Springer, vol. 45(2), pages 160-163, March.
    20. Andrés Musalem & Eric T. Bradlow & Jagmohan S. Raju, 2009. "Bayesian estimation of random‐coefficients choice models using aggregate data," Journal of Applied Econometrics, John Wiley & Sons, Ltd., vol. 24(3), pages 490-516, April.
    21. Duncan, George & Lambert, Diane, 1989. "The Risk of Disclosure for Microdata," Journal of Business & Economic Statistics, American Statistical Association, vol. 7(2), pages 207-217, April.
    22. Allenby, Greg M. & Rossi, Peter E., 1998. "Marketing models of consumer heterogeneity," Journal of Econometrics, Elsevier, vol. 89(1-2), pages 57-78, November.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Matthew J. Schneider & Shawn Mankad, 2021. "A Two-Stage Authorship Attribution Method Using Text and Structured Data for De-Anonymizing User-Generated Content," Customer Needs and Solutions, Springer;Institute for Sustainable Innovation and Growth (iSIG), vol. 8(3), pages 66-83, September.
    2. Wieringa, Jaap & Kannan, P.K. & Ma, Xiao & Reutterer, Thomas & Risselada, Hans & Skiera, Bernd, 2021. "Data analytics in a privacy-concerned world," Journal of Business Research, Elsevier, vol. 122(C), pages 915-925.
    3. Xiao-Bai Li & Jialun Qin, 2017. "Anonymizing and Sharing Medical Text Records," Information Systems Research, INFORMS, vol. 28(2), pages 332-352, June.
    4. Cheah, Jun-Hwa & Lim, Xin-Jean & Ting, Hiram & Liu, Yide & Quach, Sara, 2022. "Are privacy concerns still relevant? Revisiting consumer behaviour in omnichannel retailing," Journal of Retailing and Consumer Services, Elsevier, vol. 65(C).
    5. Bleier, Alexander & Goldfarb, Avi & Tucker, Catherine, 2020. "Consumer privacy and the future of data-based innovation and marketing," International Journal of Research in Marketing, Elsevier, vol. 37(3), pages 466-480.
    6. Morlok, Tina & Matt, Christian & Hess, Thomas, 2017. "Privatheitsforschung in den Wirtschaftswissenschaften: Entwicklung, Stand und Perspektiven," Working Papers 1/2017, University of Munich, Munich School of Management, Institute for Information Systems and New Media.
    7. Ruwan Bandara & Mario Fernando & Shahriar Akter, 2020. "Privacy concerns in E-commerce: A taxonomy and a future research agenda," Electronic Markets, Springer;IIM University of St. Gallen, vol. 30(3), pages 629-647, September.
    8. Haibing Lu & Jaideep Vaidya & Vijayalakshmi Atluri & Yingjiu Li, 2015. "Statistical Database Auditing Without Query Denial Threat," INFORMS Journal on Computing, INFORMS, vol. 27(1), pages 20-34, February.
    9. Slepchuk, Alec N. & Milne, George R. & Swani, Kunal, 2022. "Overcoming privacy concerns in consumers’ use of health information technologies: A justice framework," Journal of Business Research, Elsevier, vol. 141(C), pages 782-793.
    10. Jaspers, Esther D.T. & Pearson, Erika, 2022. "Consumers’ acceptance of domestic Internet-of-Things: The role of trust and privacy concerns," Journal of Business Research, Elsevier, vol. 142(C), pages 255-265.
    11. Potoglou, Dimitris & Palacios, Juan & Feijoo, Claudio & Gómez Barroso, Jose-Luis, 2015. "The supply of personal information: A study on the determinants of information provision in e-commerce scenarios," 26th European Regional ITS Conference, Madrid 2015 127174, International Telecommunications Society (ITS).
    12. Dongling Huang & Christian Rojas & Frank Bass, 2008. "What Happens When Demand Is Estimated With A Misspecified Model?," Journal of Industrial Economics, Wiley Blackwell, vol. 56(4), pages 809-839, December.
    13. K. Sudhir, 2001. "Structural Analysis of Manufacturer Pricing in the Presence of a Strategic Retailer," Marketing Science, INFORMS, vol. 20(3), pages 244-264, October.
    14. Chan, Tat Y. & Narasimhan, Chakravarthi & Yoon, Yeujun, 2017. "Advertising and price competition in a manufacturer-retailer channel," International Journal of Research in Marketing, Elsevier, vol. 34(3), pages 694-716.
    15. Grace Fox & Lisa van der Werff & Pierangelo Rosati & Patricia Takako Endo & Theo Lynn, 2022. "Examining the determinants of acceptance and use of mobile contact tracing applications in Brazil: An extended privacy calculus perspective," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 73(7), pages 944-967, July.
    16. Attié, Elodie & Meyer-Waarden, Lars, 2022. "The acceptance and usage of smart connected objects according to adoption stages: an enhanced technology acceptance model integrating the diffusion of innovation, uses and gratification and privacy ca," Technological Forecasting and Social Change, Elsevier, vol. 176(C).
    17. Natalie Shlomo & Chris Skinner, 2022. "Measuring risk of re‐identification in microdata: State‐of‐the art and new directions," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 185(4), pages 1644-1662, October.
    18. Tat Chan & Naser Hamdi & Xiang Hui & Zhenling Jiang, 2022. "The Value of Verified Employment Data for Consumer Lending: Evidence from Equifax," Marketing Science, INFORMS, vol. 41(4), pages 795-814, July.
    19. Mwesiumo, Deodat & Halpern, Nigel & Budd, Thomas & Suau-Sanchez, Pere & Bråthen, Svein, 2021. "An exploratory and confirmatory composite analysis of a scale for measuring privacy concerns," Journal of Business Research, Elsevier, vol. 136(C), pages 63-75.
    20. Suresh Divakar & Brian T. Ratchford & Venkatesh Shankar, 2005. "Practice Prize Article—: A Multichannel, Multiregion Sales Forecasting Model and Decision Support System for Consumer Packaged Goods," Marketing Science, INFORMS, vol. 24(3), pages 334-350, July.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:inm:orisre:v:34:y:2023:i:3:p:1066-1088. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Chris Asher (email available below). General contact details of provider: https://edirc.repec.org/data/inforea.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.