Advanced Search
MyIDEAS: Login

Identification, data combination and the risk of disclosure

Contents:

Author Info

  • Tatiana Komarova
  • Denis Nekipelov

    (Institute for Fiscal Studies and Berkeley)

  • Evgeny Yakovlev

Abstract

Businesses routinely rely on econometric models to analyze and predict consumer behavior. Estimation of such models may require combining a firm's internal data with external datasets to take into account sample selection, missing observations, omitted variables and errors in measurement within the existing data source. In this paper we point out that these data problems can be addressed when estimating econometric models from combined data using the data mining techniques under mild assumptions regarding the data distribution. However, data combination leads to serious threats to security of consumer data: we demonstrate that point identification of an econometric model from combined data is incompatible with restrictions on the risk of individual disclosure. Consequently, if a consumer model is point identified, the firm would (implicitly or explicitly) reveal the identity of at least some of consumers in its internal data. More importantly, we provide an argument that unless the firm places a restriction on the individual disclosure risk when combining data, even if the raw combined dataset is not shared with a third party, an adversary or a competitor can gather confidential information regarding some individuals from the estimated model.

Download Info

If you experience problems downloading a file, check if you have the proper application to view it first. In case of further problems read the IDEAS help page. Note that these files are not on the IDEAS site. Please be patient as the files may be large.
File URL: http://cemmap.ifs.org.uk/wps/cwp3811.pdf
Download Restriction: no

Bibliographic Info

Paper provided by Centre for Microdata Methods and Practice, Institute for Fiscal Studies in its series CeMMAP working papers with number CWP38/11.

as in new window
Length:
Date of creation: Dec 2011
Date of revision:
Handle: RePEc:ifs:cemmap:38/11

Contact details of provider:
Postal: The Institute for Fiscal Studies 7 Ridgmount Street LONDON WC1E 7AE
Phone: (+44) 020 7291 4800
Fax: (+44) 020 7323 4780
Email:
Web page: http://cemmap.ifs.org.uk
More information through EDIRC

Order Information:
Postal: The Institute for Fiscal Studies 7 Ridgmount Street LONDON WC1E 7AE
Email:

Related research

Keywords:

This paper has been announced in the following NEP Reports:

References

References listed on IDEAS
Please report citation or reference errors to , or , if you are the registered author of the cited work, log in to your RePEc Author Service profile, click on "citations" and make appropriate adjustments.:
as in new window
  1. Giacomo Calzolari & Alessandro Pavan, 2004. "On the Optimality of Privacy in Sequential Contracting," Discussion Papers 1394, Northwestern University, Center for Mathematical Studies in Economics and Management Science.
  2. Alessandro Acquisti & Hal R. Varian, 2005. "Conditioning Prices on Purchase History," Marketing Science, INFORMS, vol. 24(3), pages 367-381, May.
  3. Curtis R. Taylor, 2004. "Consumer Privacy and the Market for Customer Information," RAND Journal of Economics, The RAND Corporation, vol. 35(4), pages 631-650, Winter.
  4. Thierry Magnac & Eric Maurin, 2004. "Partial Identification in Monotone Binary Models : Discrete Regressors and Interval Data," Working Papers 2004-11, Centre de Recherche en Economie et Statistique.
  5. Molinari, Francesca, 2005. "Partial Identification of Probability Distributions with Misclassified Data," Working Papers 05-10, Cornell University, Center for Analytic Economics.
  6. Avi Goldfarb & Catherine Tucker, 2011. "Online Display Advertising: Targeting and Obtrusiveness," Marketing Science, INFORMS, vol. 30(3), pages 389-404, 05-06.
  7. Catherine Tucker & Amalia Miller, 2007. "Privacy Protection and Technology Diffusion: The Case of Electronic Medical Records," Working Papers 07-16, NET Institute, revised Sep 2007.
  8. Horowitz, Joel L. & Manski, Charles F., 2006. "Identification and estimation of statistical functionals using incomplete data," Journal of Econometrics, Elsevier, vol. 132(2), pages 445-459, June.
Full references (including those not matched with items on IDEAS)

Citations

Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
as in new window

Cited by:
  1. David Pacini, 2012. "Least Square Linear Prediction with Two-Sample Data," Bristol Economics Discussion Papers 12/631, Department of Economics, University of Bristol, UK.

Lists

This item is not listed on Wikipedia, on a reading list or among the top items on IDEAS.

Statistics

Access and download statistics

Corrections

When requesting a correction, please mention this item's handle: RePEc:ifs:cemmap:38/11. See general information about how to correct material in RePEc.

For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: (Stephanie Seavers).

If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

If references are entirely missing, you can add them using this form.

If the full references list an item that is present in RePEc, but the system did not link to it, you can help with this form.

If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your profile, as there may be some citations waiting for confirmation.

Please note that corrections may take a couple of weeks to filter through the various RePEc services.