Identification, data combination and the risk of disclosure
Businesses routinely rely on econometric models to analyze and predict consumer behavior. Estimation of such models may require combining a firm's internal data with external datasets to take into account sample selection, missing observations, omitted variables and errors in measurement within the existing data source. In this paper we point out that these data problems can be addressed when estimating econometric models from combined data using the data mining techniques under mild assumptions regarding the data distribution. However, data combination leads to serious threats to security of consumer data: we demonstrate that point identification of an econometric model from combined data is incompatible with restrictions on the risk of individual disclosure. Consequently, if a consumer model is point identified, the firm would (implicitly or explicitly) reveal the identity of at least some of consumers in its internal data. More importantly, we provide an argument that unless the firm places a restriction on the individual disclosure risk when combining data, even if the raw combined dataset is not shared with a third party, an adversary or a competitor can gather confidential information regarding some individuals from the estimated model.
|Date of creation:||Dec 2011|
|Date of revision:|
|Contact details of provider:|| Postal: |
Phone: (+44) 020 7291 4800
Fax: (+44) 020 7323 4780
Web page: http://cemmap.ifs.org.uk
More information through EDIRC
|Order Information:|| Postal: The Institute for Fiscal Studies 7 Ridgmount Street LONDON WC1E 7AE|
References listed on IDEAS
Please report citation or reference errors to , or , if you are the registered author of the cited work, log in to your RePEc Author Service profile, click on "citations" and make appropriate adjustments.:
- Amalia R. Miller & Catherine Tucker, 2009.
"Privacy Protection and Technology Diffusion: The Case of Electronic Medical Records,"
INFORMS, vol. 55(7), pages 1077-1093, July.
- Catherine Tucker & Amalia Miller, 2007. "Privacy Protection and Technology Diffusion: The Case of Electronic Medical Records," Working Papers 07-16, NET Institute, revised Sep 2007.
- Curtis R. Taylor, 2004. "Consumer Privacy and the Market for Customer Information," RAND Journal of Economics, The RAND Corporation, vol. 35(4), pages 631-650, Winter.
- Alessandro Pavan, 2004.
"On the Optimality of Privacy in Sequential Contracting,"
Theory workshop papers
658612000000000067, UCLA Department of Economics.
- Calzolari, Giacomo & Pavan, Alessandro, 2006. "On the optimality of privacy in sequential contracting," Journal of Economic Theory, Elsevier, vol. 130(1), pages 168-204, September.
- Giacomo Calzolari & Alessandro Pavan, 2004. "On the Optimality of Privacy in Sequential Contracting," Discussion Papers 1394, Northwestern University, Center for Mathematical Studies in Economics and Management Science.
- Giacomo Calzolari & Alessandro Pavan, 2005. "On the Optimality of Privacy in Sequential Contracting," Discussion Papers 1404, Northwestern University, Center for Mathematical Studies in Economics and Management Science.
- Thierry Magnac & Eric Maurin, 2004.
"Partial Identification in Monotone Binary Models : Discrete Regressors and Interval Data,"
2004-11, Centre de Recherche en Economie et Statistique.
- Thierry Magnac & Eric Maurin, 2008. "Partial Identification in Monotone Binary Models: Discrete Regressors and Interval Data," Review of Economic Studies, Oxford University Press, vol. 75(3), pages 835-864.
- Magnac, Thierry & Maurin, Eric, 2004. "Partial Identification in Monotone Binary Models: Discrete Regressors and Interval Data," IDEI Working Papers 280, Institut d'Économie Industrielle (IDEI), Toulouse, revised Jan 2005.
- Alessandro Acquisti & Hal R. Varian, 2005.
"Conditioning Prices on Purchase History,"
INFORMS, vol. 24(3), pages 367-381, May.
- Avi Goldfarb & Catherine Tucker, 2011. "Online Display Advertising: Targeting and Obtrusiveness," Marketing Science, INFORMS, vol. 30(3), pages 389-404, 05-06.
- Horowitz, Joel L. & Manski, Charles F., 2006. "Identification and estimation of statistical functionals using incomplete data," Journal of Econometrics, Elsevier, vol. 132(2), pages 445-459, June.
- Molinari, Francesca, 2005.
"Partial Identification of Probability Distributions with Misclassified Data,"
05-10, Cornell University, Center for Analytic Economics.
- Molinari, Francesca, 2008. "Partial identification of probability distributions with misclassified data," Journal of Econometrics, Elsevier, vol. 144(1), pages 81-117, May.
When requesting a correction, please mention this item's handle: RePEc:ifs:cemmap:38/11. See general information about how to correct material in RePEc.
For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: (Stephanie Seavers)
If references are entirely missing, you can add them using this form.