This file is part of IDEAS, which uses RePEc data


[ Papers | Articles | Software | Books | Chapters | Authors | Institutions | JEL Classification | NEP reports | Search | New papers by email | Author registration | Rankings | Volunteers | FAQ | Blog | Help! ]

Distribution-Preserving Statistical Disclosure Limitation

Author info | Abstract | Publisher info | Download info | Related research | Statistics
Author Info
Woodcock, Simon
Benedetto, Gary

Additional information is available for the following registered author(s):

Abstract

One approach to limiting disclosure risk in public-use microdata is to release multiply-imputed, partially synthetic data sets. These are data on actual respondents, but with confidential data replaced by multiply-imputed synthetic values. A mis-specified imputation model can invalidate inferences because the distribution of synthetic data is completely determined by the model used to generate them. We present two practical methods of generating synthetic values when the imputer has only limited information about the true data generating process. One is applicable when the true likelihood is known up to a monotone transformation. The second requires only limited knowledge of the true likelihood, but nevertheless preserves the conditional distribution of the confidential data, up to sampling error, on arbitrary subdomains. Our method maximizes data utility and minimizes incremental disclosure risk up to posterior uncertainty in the imputation model and sampling error in the estimated transformation. We validate the approach with a simulation and application to a large linked employer-employee database.

Download Info
To download:

If you experience problems downloading a file, check if you have the proper application to view it first. Information about this may be contained in the File-Format links below. In case of further problems read the IDEAS help file. Note that these files are not on the IDEAS site. Please be patient as the files may be large.

File URL: http://mpra.ub.uni-muenchen.de/155/
File Format:
File Function:
Download Restriction: no

Publisher Info
Paper provided by University Library of Munich, Germany in its series MPRA Paper with number 155.

Download reference. The following formats are available: HTML, plain text, BibTeX, RIS (EndNote), ReDIF
Length:
Date of creation: Sep 2006
Date of revision:
Handle: RePEc:pra:mprapa:155

Contact details of provider:
Postal: Schackstr. 4, D-80539 Munich, Germany
Phone: +49-(0)89-2180-2219
Fax: +49-(0)89-2180-3900
Web page: http://mpra.ub.uni-muenchen.de
More information through EDIRC

For technical questions regarding this item, or to correct its listing, contact: (Ekkehart Schlicht).

Related research
Keywords: statistical disclosure limitation confidentiality privacy multiple imputation partially synthetic data

Other versions of this item:

Find related papers by JEL classification:
C4 - Mathematical and Quantitative Methods - - Econometric and Statistical Methods: Special Topics
C81 - Mathematical and Quantitative Methods - - Data Collection and Data Estimation Methodology; Computer Programs - - - Microeconomic Data

This paper has been announced in the following NEP Reports:

References listed on IDEAS
Please report citation or reference errors to , or , if you are the registered author of the cited work, log in to your RePEc Author Service profile, click on "citations" and make appropriate adjustments.:
  1. Reiter, Jerome P., 2005. "Estimating Risks of Identification Disclosure in Microdata," Journal of the American Statistical Association, American Statistical Association, vol. 100, pages 1103-1112, December. [Downloadable!] (restricted)
  2. John J. Abowd & John Haltiwanger & Julia Lane, 2004. "Integrated Longitudinal Employer-Employee Data for the United States," American Economic Review, American Economic Association, vol. 94(2), pages 224-229, May. [Downloadable!]
Full references

Statistics
Access and download statistics

Did you know? Over 77% of the top 1000 economists are registered on RePEc.

This page was last updated on 2008-11-17.


This information is provided to you by IDEAS at the Department of Economics, College of Liberal Arts and Sciences, University of Connecticut using RePEc data on a server sponsored by the Society for Economic Dynamics.