Identification, data combination, and the risk of disclosure

My bibliography Save this article

Identification, data combination, and the risk of disclosure

Author

Listed:

Tatiana Komarova
Denis Nekipelov
Evgeny Yakovlev

Registered:

Abstract

It is commonplace that the data needed for econometric inference are not contained in a single source. In this paper we analyze the problem of parametric inference from combined individual‐level data when data combination is based on personal and demographic identifiers such as name, age, or address. Our main question is the identification of the econometric model based on the combined data when the data do not contain exact individual identifiers and no parametric assumptions are imposed on the joint distribution of information that is common across the combined data set. We demonstrate the conditions on the observable marginal distributions of data in individual data sets that can and cannot guarantee identification of the parameters of interest. We also note that the data combination procedure is essential in a semiparametric setting such as ours. Provided that the (nonparametric) data combination procedure can only be defined in finite samples, we introduce a new notion of identification based on the concept of limits of statistical experiments. Our results apply to the setting where the individual data used for inferences are sensitive and their combination may lead to a substantial increase in the data sensitivity or lead to a “de‐anonymization” of the previously “anonymized” information. We demonstrate that the point identification of an econometric model from combined data is incompatible with restrictions on the risk of individual disclosure. If the data combination procedure guarantees a bound on the risk of individual disclosure, then the information available from the combined data set allows one to identify the parameter of interest only partially, and the size of the identification region is inversely related to the upper bound guarantee for the disclosure risk. This result is new in the context of data combination as we notice that the quality of links that need to be used in the combined data to assure point identification may be much higher than the average link quality in the entire data set, and thus point inference requires the use of the most sensitive subset of the data. Our results provide important insights into the ongoing discourse on the empirical analysis of merged administrative records as well as discussions on the “disclosive” nature of policies implemented by the data‐driven companies (such as internet services companies and medical companies using individual patient records for policy decisions).

Suggested Citation

Tatiana Komarova & Denis Nekipelov & Evgeny Yakovlev, 2018. "Identification, data combination, and the risk of disclosure," Quantitative Economics, Econometric Society, vol. 9(1), pages 395-440, March.

Handle: RePEc:wly:quante:v:9:y:2018:i:1:p:395-440
DOI: 10.3982/QE568

Download full text from publisher

Other versions of this item:

Komarova, Tatiana & Nekipelov, Denis & Yakovlev, Evgeny, 2018. "Identification, data combination and the risk of disclosure," LSE Research Online Documents on Economics 79384, London School of Economics and Political Science, LSE Library.
Tatiana V. Komarova & Denis Nekipelov & Evgeny Yakovlev, 2011. "Identification, data combination and the risk of disclosure," CeMMAP working papers CWP38/11, Centre for Microdata Methods and Practice, Institute for Fiscal Studies.

References listed on IDEAS

Thierry Magnac & Eric Maurin, 2008. "Partial Identification in Monotone Binary Models: Discrete Regressors and Interval Data," The Review of Economic Studies, Review of Economic Studies Ltd, vol. 75(3), pages 835-864.
- Thierry Magnac & Eric Maurin, 2004. "Partial Identification in Monotone Binary Models : Discrete Regressors and Interval Data," Working Papers 2004-11, Center for Research in Economics and Statistics.
- Thierry Magnac & Eric Maurin, 2008. "Partial Identification in Monotone Binary Models: Discrete Regressors and Interval Data," PSE-Ecole d'économie de Paris (Postprint) halshs-00754272, HAL.
- Thierry Magnac & Eric Maurin, 2008. "Partial Identification in Monotone Binary Models: Discrete Regressors and Interval Data," Post-Print halshs-00754272, HAL.
- Magnac, Thierry & Maurin, Eric, 2004. "Partial Identification in Monotone Binary Models: Discrete Regressors and Interval Data," IDEI Working Papers 280, Institut d'Économie Industrielle (IDEI), Toulouse, revised Jan 2005.
Manuel A. Domínguez & Ignacio N. Lobato, 2004. "Consistent Estimation of Models Defined by Conditional Moment Restrictions," Econometrica, Econometric Society, vol. 72(5), pages 1601-1615, September.
Goldfarb, Avi & Greenstein, Shane M. & Tucker, Catherine E. (ed.), 2015. "Economic Analysis of the Digital Economy," National Bureau of Economic Research Books, University of Chicago Press, number 9780226206981, December.
Charles F. Manski & Elie Tamer, 2002. "Inference on Regressions with Interval Data on a Regressor or Outcome," Econometrica, Econometric Society, vol. 70(2), pages 519-546, March.
Ridder, Geert & Moffitt, Robert, 2007. "The Econometrics of Data Combination," Handbook of Econometrics, in: J.J. Heckman & E.E. Leamer (ed.), Handbook of Econometrics, edition 1, volume 6, chapter 75, Elsevier.
Avi Goldfarb & Catherine Tucker, 2011. "Online Display Advertising: Targeting and Obtrusiveness," Marketing Science, INFORMS, vol. 30(3), pages 389-404, 05-06.
Calzolari, Giacomo & Pavan, Alessandro, 2006. "On the optimality of privacy in sequential contracting," Journal of Economic Theory, Elsevier, vol. 130(1), pages 168-204, September.
- Alessandro Pavan, 2004. "On the Optimality of Privacy in Sequential Contracting," Theory workshop papers 658612000000000067, UCLA Department of Economics.
- Giacomo Calzolari & Alessandro Pavan, 2005. "On the Optimality of Privacy in Sequential Contracting," Discussion Papers 1404, Northwestern University, Center for Mathematical Studies in Economics and Management Science.
- Giacomo Calzolari & Alessandro Pavan, 2004. "On the Optimality of Privacy in Sequential Contracting," Discussion Papers 1394, Northwestern University, Center for Mathematical Studies in Economics and Management Science.
P. Lahiri & Michael D. Larsen, 2005. "Regression Analysis With Linked Data," Journal of the American Statistical Association, American Statistical Association, vol. 100, pages 222-230, March.
Horowitz, Joel L & Manski, Charles F, 1995. "Identification and Robustness with Contaminated and Corrupted Data," Econometrica, Econometric Society, vol. 63(2), pages 281-302, March.
Alessandro Acquisti & Hal R. Varian, 2005. "Conditioning Prices on Purchase History," Marketing Science, INFORMS, vol. 24(3), pages 367-381, May.
- Alessandro Acquisti & Hal R. Varian, 2002. "Contidioning Prices on Purchase History," Microeconomics 0210001, University Library of Munich, Germany.
Amalia R. Miller & Catherine Tucker, 2009. "Privacy Protection and Technology Diffusion: The Case of Electronic Medical Records," Management Science, INFORMS, vol. 55(7), pages 1077-1093, July.
- Catherine Tucker & Amalia Miller, 2007. "Privacy Protection and Technology Diffusion: The Case of Electronic Medical Records," Working Papers 07-16, NET Institute, revised Sep 2007.
Horowitz, Joel L. & Manski, Charles F., 2006. "Identification and estimation of statistical functionals using incomplete data," Journal of Econometrics, Elsevier, vol. 132(2), pages 445-459, June.
Curtis R. Taylor, 2004. "Consumer Privacy and the Market for Customer Information," RAND Journal of Economics, The RAND Corporation, vol. 35(4), pages 631-650, Winter.
Satkartar K. Kinney & Jerome P. Reiter & Arnold P. Reznek & Javier Miranda & Ron S. Jarmin & John M. Abowd, 2011. "Towards Unrestricted Public Use Business Microdata: The Synthetic Longitudinal Business Database," International Statistical Review, International Statistical Institute, vol. 79(3), pages 362-384, December.
- Satkartar K. Kinney & Jerome P. Reiter & Arnold P. Reznek & Javier Miranda & Ron S. Jarmin & John M. Abowd, 2011. "Towards Unrestricted Public Use Business Microdata: The Synthetic Longitudinal Business Database," Working Papers 11-04, Center for Economic Studies, U.S. Census Bureau.
Tatiana Komarova & Denis Nekipelov & Evgeny Yakovlev, 2015. "Estimation of Treatment Effects from Combined Data: Identification versus Data Security," NBER Chapters, in: Economic Analysis of the Digital Economy, pages 279-308, National Bureau of Economic Research, Inc.
Molinari, Francesca, 2008. "Partial identification of probability distributions with misclassified data," Journal of Econometrics, Elsevier, vol. 144(1), pages 81-117, May.
- Molinari, Francesca, 2005. "Partial Identification of Probability Distributions with Misclassified Data," Working Papers 05-10, Cornell University, Center for Analytic Economics.
Philip J. Cross & Charles F. Manski, 2002. "Regressions, Short and Long," Econometrica, Econometric Society, vol. 70(1), pages 357-368, January.
- Philip Cross & Charles F. Manski, 2000. "Regressions, Short and Long," Econometric Society World Congress 2000 Contributed Papers 0385, Econometric Society.
Kim, Gunky & Chambers, Raymond, 2012. "Regression analysis under incomplete linkage," Computational Statistics & Data Analysis, Elsevier, vol. 56(9), pages 2756-2770.
Karr, A.F. & Kohnen, C.N. & Oganian, A. & Reiter, J.P. & Sanil, A.P., 2006. "A Framework for Evaluating the Utility of Data Altered to Protect Confidentiality," The American Statistician, American Statistical Association, vol. 60, pages 224-232, August.

Full references (including those not matched with items on IDEAS)

Citations

Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.

Cited by:

Komarova, Tatiana & Nekipelov, Denis & Al Rafi , Ahnaf & Yakovlev, Evgeny, 2017. "K-anonymity: A note on the trade-off between data utility and data security," Applied Econometrics, Russian Presidential Academy of National Economy and Public Administration (RANEPA), vol. 48, pages 44-62.
- Komarova, Tatiana & Nekipelov, Denis & Al Rafi, Ahnaf & Yakovlev, Evgeny, 2017. "K-anonymity: a note on the trade-off between data utility and data security," LSE Research Online Documents on Economics 85923, London School of Economics and Political Science, LSE Library.
Tatiana Komarova & Denis Nekipelov, 2020. "Identification and Formal Privacy Guarantees," Papers 2006.14732, arXiv.org, revised May 2021.
David Pacini, 2012. "Least Square Linear Prediction with Two-Sample Data," Bristol Economics Discussion Papers 12/631, School of Economics, University of Bristol, UK.

Most related items

These are the items that most often cite the same works as this one and are cited by the same works as this one.

Francesca Molinari, 2020. "Microeconometrics with Partial Identi?cation," CeMMAP working papers CWP15/20, Centre for Microdata Methods and Practice, Institute for Fiscal Studies.
- Francesca Molinari, 2020. "Microeconometrics with Partial Identification," Papers 2004.11751, arXiv.org.
Francesca Molinari, 2019. "Econometrics with Partial Identification," CeMMAP working papers CWP25/19, Centre for Microdata Methods and Practice, Institute for Fiscal Studies.
Morlok, Tina & Matt, Christian & Hess, Thomas, 2017. "Privatheitsforschung in den Wirtschaftswissenschaften: Entwicklung, Stand und Perspektiven," Working Papers 1/2017, University of Munich, Munich School of Management, Institute for Information Systems and New Media.
Christian Bontemps & Thierry Magnac & Eric Maurin, 2012. "Set Identified Linear Models," Econometrica, Econometric Society, vol. 80(3), pages 1129-1155, May.
- Bontemps, Christian & Magnac, Thierry & Maurin, Eric, 2007. "Set Identified Linear Models," IDEI Working Papers 494, Institut d'Économie Industrielle (IDEI), Toulouse.
- Bontemps, Christian & Magnac, Thierry & Maurin, Eric, 2009. "Set Identified Linear Models," TSE Working Papers 09-090, Toulouse School of Economics (TSE).
- Christian Bontemps & Thierry Magnac & Eric Maurin, 2012. "Set Identified Linear Models," Post-Print halshs-00754590, HAL.
- Christian Bontemps & Thierry Magnac & Eric Maurin, 2012. "Set Identified Linear Models," PSE-Ecole d'économie de Paris (Postprint) halshs-00754590, HAL.
- Christian Bontemps & Thierry Magnac & Eric Maurin, 2011. "Set identified linear models," CeMMAP working papers CWP13/11, Centre for Microdata Methods and Practice, Institute for Fiscal Studies.
Jin-Hyuk Kim & Liad Wagman, 2015. "Screening incentives and privacy protection in financial markets: a theoretical and empirical analysis," RAND Journal of Economics, RAND Corporation, vol. 46(1), pages 1-22, March.
Magnac, Thierry, 2013. "Identification partielle : méthodes et conséquences pour les applications empiriques," L'Actualité Economique, Société Canadienne de Science Economique, vol. 89(4), pages 233-258, Décembre.
- Magnac, Thierry, 2014. "Identification partielle: méthodes et conséquences pour les applications empiriques," IDEI Working Papers 814, Institut d'Économie Industrielle (IDEI), Toulouse.
- Magnac, Thierry, 2014. "Identification partielle: méthodes et conséquences pour les applications empiriques," TSE Working Papers 14-458, Toulouse School of Economics (TSE).
Alessandro Acquisti & Curtis Taylor & Liad Wagman, 2016. "The Economics of Privacy," Journal of Economic Literature, American Economic Association, vol. 54(2), pages 442-492, June.
Potoglou, Dimitris & Palacios, Juan & Feijoo, Claudio & Gómez Barroso, Jose-Luis, 2015. "The supply of personal information: A study on the determinants of information provision in e-commerce scenarios," 26th European Regional ITS Conference, Madrid 2015 127174, International Telecommunications Society (ITS).
Dengler, Sebastian & Prüfer, Jens, 2021. "Consumers' privacy choices in the era of big data," Games and Economic Behavior, Elsevier, vol. 130(C), pages 499-520.
- Dengler, Sebastian & Prüfer, Jens, 2018. "Consumers' Privacy Choices in the Era of Big Data," Other publications TiSEM 809f6834-9e85-4449-b21a-6, Tilburg University, School of Economics and Management.
- Dengler, Sebastian & Prüfer, Jens, 2018. "Consumers' Privacy Choices in the Era of Big Data," Discussion Paper 2018-014, Tilburg University, Tilburg Law and Economic Center.
- Prüfer, Jens & Dengler, Sebastian, 2018. "Consumers' Privacy Choices in the Era of Big Data," Other publications TiSEM 3fac3011-dc4d-4b81-8f66-3, Tilburg University, School of Economics and Management.
- Prüfer, Jens & Dengler, Sebastian, 2018. "Consumers' Privacy Choices in the Era of Big Data," Discussion Paper 2018-012, Tilburg University, Center for Economic Research.
Bouckaert, J.M.C. & Degryse, H.A., 2006. "Opt In versus Opt Out : A Free-Entry Analysis of Privacy Policies," Other publications TiSEM 17393c5d-1ed2-47ec-bc96-9, Tilburg University, School of Economics and Management.
- BOUCKAERT, Jan & DEGRYSE, Hans, 2007. "Opt in versus opt out: A free-entry analysis of privacy policies," Working Papers 2007025, University of Antwerp, Faculty of Business and Economics.
- Bouckaert, J.M.C. & Degryse, H.A., 2006. "Opt In versus Opt Out : A Free-Entry Analysis of Privacy Policies," Discussion Paper 2006-96, Tilburg University, Center for Economic Research.
- Jan Bouckaert & Hans Degryse, 2006. "Opt In Versus Opt Out: A Free-Entry Analysis of Privacy Policies," CESifo Working Paper Series 1831, CESifo.
- J Bouckaert & Hans Degryse, 2006. "Opt in versus Opt out: a free-entry analysis of privacy policies," Working Papers Department of Accountancy, Finance and Insurance (AFI), Leuven 500204, KU Leuven, Faculty of Economics and Business (FEB), Department of Accountancy, Finance and Insurance (AFI), Leuven.
Piccolo, Salvatore & Pagnozzi, Marco, 2013. "Information sharing between vertical hierarchies," Games and Economic Behavior, Elsevier, vol. 79(C), pages 201-222.
- Marco Pagnozzi & Salvatore Piccolo, 2012. "Information Sharing between Vertical Hierarchies," CSEF Working Papers 322, Centre for Studies in Economics and Finance (CSEF), University of Naples, Italy.
Guido W. Imbens & Charles F. Manski, 2004. "Confidence Intervals for Partially Identified Parameters," Econometrica, Econometric Society, vol. 72(6), pages 1845-1857, November.
- Guido Imbens & Charles F. Manski, 2003. "Confidence intervals for partially identified parameters," CeMMAP working papers CWP09/03, Centre for Microdata Methods and Practice, Institute for Fiscal Studies.
- Guido Imbens & Charles F. Manski, 2003. "Confidence intervals for partially identified parameters," CeMMAP working papers 09/03, Institute for Fiscal Studies.
Daron Acemoglu & Ali Makhdoumi & Azarakhsh Malekian & Asu Ozdaglar, 2022. "Too Much Data: Prices and Inefficiencies in Data Markets," American Economic Journal: Microeconomics, American Economic Association, vol. 14(4), pages 218-256, November.
- Daron Acemoglu & Ali Makhdoumi & Azarakhsh Malekian & Asuman Ozdaglar, 2019. "Too Much Data: Prices and Inefficiencies in Data Markets," NBER Working Papers 26296, National Bureau of Economic Research, Inc.
- Acemoglu, Daron & Makhdoumi, Ali & Ozdaglar, Asuman & Malekian, Azarakhsh, 2019. "Too Much Data: Prices and Inefficiencies in Data Markets," CEPR Discussion Papers 14225, C.E.P.R. Discussion Papers.
Arie Beresteanu & Francesca Molinari, 2008. "Asymptotic Properties for a Class of Partially Identified Models," Econometrica, Econometric Society, vol. 76(4), pages 763-814, July.
- Beresteanu, Arie & Molinari, Francesca, 2006. "Asymptotic Properties for a Class of Partially Identified Models," Working Papers 06-07, Cornell University, Center for Analytic Economics.
- Beresteanu, Arie & Molinari, Francesca, 2006. "Asymptotic Properties for a Class of Partially Identified Models," Working Papers 06-04, Duke University, Department of Economics.
- Arie Beresteanu & Francesca Molinari, 2006. "Asymptotic properties for a class of partially identified models," CeMMAP working papers CWP10/06, Centre for Microdata Methods and Practice, Institute for Fiscal Studies.
Byung‐Cheol Kim & Jay Pil Choi, 2010. "Customer Information Sharing: Strategic Incentives and New Implications," Journal of Economics & Management Strategy, Wiley Blackwell, vol. 19(2), pages 403-433, June.
- Byung-Cheol Kim & Jay Pil Choi, 2007. "Customer Infomation Sharing: Strategic Incentives and New Implications," Working Papers 07-27, NET Institute, revised Sep 2007.
Lagerlöf, Johan N.M., 2023. "Surfing incognito: Welfare effects of anonymous shopping," International Journal of Industrial Organization, Elsevier, vol. 87(C).
Avi Goldfarb, 2014. "What is Different About Online Advertising?," Review of Industrial Organization, Springer;The Industrial Organization Society, vol. 44(2), pages 115-129, March.
S. Nageeb Ali & Gregory Lewis & Shoshana Vasserman, 2019. "Voluntary Disclosure and Personalized Pricing," NBER Working Papers 26592, National Bureau of Economic Research, Inc.
- Nageeb Ali, S. & Lewis, Greg & Vasserman, Shoshana, 2022. "Voluntary Disclosure and Personalized Pricing," Research Papers 3890, Stanford University, Graduate School of Business.
- S. Nageeb Ali & Greg Lewis & Shoshana Vasserman, 2019. "Voluntary Disclosure and Personalized Pricing," Papers 1912.04774, arXiv.org, revised Aug 2020.
Xiaohong Chen & Yingyao Hu, 2006. "Identification and Inference of Nonlinear Models Using Two Samples with Arbitrary Measurement Errors," Cowles Foundation Discussion Papers 1590, Cowles Foundation for Research in Economics, Yale University.
Juan Carlos Escanciano & Lin Zhu, 2013. "Set inferences and sensitivity analysis in semiparametric conditionally identified models," CeMMAP working papers CWP55/13, Centre for Microdata Methods and Practice, Institute for Fiscal Studies.

More about this item

JEL classification:

C13 - Mathematical and Quantitative Methods - - Econometric and Statistical Methods and Methodology: General - - - Estimation: General
C14 - Mathematical and Quantitative Methods - - Econometric and Statistical Methods and Methodology: General - - - Semiparametric and Nonparametric Methods: General
C25 - Mathematical and Quantitative Methods - - Single Equation Models; Single Variables - - - Discrete Regression and Qualitative Choice Models; Discrete Regressors; Proportions; Probabilities
C35 - Mathematical and Quantitative Methods - - Multiple or Simultaneous Equation Models; Multiple Variables - - - Discrete Regression and Qualitative Choice Models; Discrete Regressors; Proportions

Statistics

Access and download statistics

Corrections

All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:wly:quante:v:9:y:2018:i:1:p:395-440. See general information about how to correct material in RePEc.

If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Wiley Content Delivery (email available below). General contact details of provider: https://edirc.repec.org/data/essssea.html .

Please note that corrections may take a couple of weeks to filter through the various RePEc services.

IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.

Browse Econ Literature

More features

Identification, data combination, and the risk of disclosure

Author

Abstract

Suggested Citation

Download full text from publisher

Other versions of this item:

References listed on IDEAS

Citations

Most related items

More about this item

JEL classification:

Statistics

Corrections

More services and features

MyIDEAS

Author registration

Rankings

RePEc Genealogy

RePEc Biblio

MPRA

New papers by email

EconAcademics

Plagiarism

About RePEc

RePEc home

Blog

Help/FAQ

RePEc team

Participating archives

Privacy statement

Help us

Corrections

Volunteers

Get papers listed

Open a RePEc archive

Get RePEc data