This file is part of IDEAS, which uses RePEc data


[ Papers | Articles | Software | Books | Chapters | Authors | Institutions | JEL Classification | NEP reports | Search | New papers by email | Author registration | Rankings | Volunteers | FAQ | Blog | Help! ]

Distribution-preserving statistical disclosure limitation

Author info | Abstract | Publisher info | Download info | Related research | Statistics
Author Info
Woodcock, Simon D.
Benedetto, Gary

Additional information is available for the following registered author(s):

Abstract

One approach to limiting disclosure risk in public-use microdata is to release multiply-imputed, partially synthetic data sets. These are data on actual respondents, but with confidential data replaced by multiply-imputed synthetic values. A mis-specified imputation model can invalidate inferences based on the partially synthetic data, because the imputation model determines the distribution of synthetic values. We present a practical method to generate synthetic values when the imputer has only limited information about the true data generating process. We combine a simple imputation model (such as regression) with density-based transformations that preserve the distribution of the confidential data, up to sampling error, on specified subdomains. We demonstrate through simulations and a large scale application that our approach preserves important statistical properties of the confidential data, including higher moments, with low disclosure risk.

Download Info
To download:

If you experience problems downloading a file, check if you have the proper application to view it first. Information about this may be contained in the File-Format links below. In case of further problems read the IDEAS help page. Note that these files are not on the IDEAS site. Please be patient as the files may be large.

File URL: http://www.sciencedirect.com/science/article/B6V8V-4WDGCKT-2/2/456cce67b8239c19fcaad92513a7cbc3
File Format:
File Function:
Download Restriction: Full text for ScienceDirect subscribers only

As the access to this document is restricted, you may want to look for a different version under "Related research" (further below) or search for a different version of it.

Publisher Info
Article provided by Elsevier in its journal Computational Statistics & Data Analysis.

Volume (Year): 53 (2009)
Issue (Month): 12 (October)
Pages: 4228-4242
Download reference. The following formats are available: HTML (with abstract), plain text (with abstract), BibTeX, RIS (EndNote, RefMan, ProCite), ReDIF
Handle: RePEc:eee:csdana:v:53:y:2009:i:12:p:4228-4242

Contact details of provider:
Web page: http://www.elsevier.com/locate/csda

For technical questions regarding this item, or to correct its listing, contact: (Heidi Boesdal).

Related research
Keywords:

Other versions of this item:

References listed on IDEAS
Please report citation or reference errors to , or , if you are the registered author of the cited work, log in to your RePEc Author Service profile, click on "citations" and make appropriate adjustments.:
  1. Reiter, Jerome P., 2005. "Estimating Risks of Identification Disclosure in Microdata," Journal of the American Statistical Association, American Statistical Association, vol. 100, pages 1103-1112, December. [Downloadable!] (restricted)
  2. John J. Abowd & John Haltiwanger & Julia Lane, 2004. "Integrated Longitudinal Employer-Employee Data for the United States," American Economic Review, American Economic Association, vol. 94(2), pages 224-229, May. [Downloadable!]
    Other versions:
  3. John Abowd & Bryce Stephens & Lars Vilhuber, 2006. "The LEHD Infrastructure Files and the Creation of the Quarterly Workforce Indicators," Technical Papers 2006-01, Longitudinal Employer-Household Dynamics, Center for Economic Studies, U.S. Census Bureau. [Downloadable!]
    Other versions:
Full references

Statistics
Access and download statistics

Did you know? There are over 21000 authors registered on RePEc Author Service.

This page was last updated on 2009-12-3.


This information is provided to you by IDEAS at the Department of Economics, College of Liberal Arts and Sciences, University of Connecticut using RePEc data on a server sponsored by the Society for Economic Dynamics.