One approach to limiting disclosure risk in public-use microdata is to release multiply-imputed, partially synthetic data sets. These are data on actual respondents, but with confidential data replaced by multiply-imputed synthetic values. A mis-specified imputation model can invalidate inferences because the distribution of synthetic data is completely determined by the model used to generate them. We present two practical methods of generating synthetic values when the imputer has only limited information about the true data generating process. One is applicable when the true likelihood is known up to a monotone transformation. The second requires only limited knowledge of the true likelihood, but nevertheless preserves the conditional distribution of the confidential data, up to sampling error, on arbitrary subdomains. Our method maximizes data utility and minimizes incremental disclosure risk up to posterior uncertainty in the imputation model and sampling error in the estimated transformation. We validate the approach with a simulation and application to a large linked employer-employee database.
Download Info
To download:
If you experience problems downloading a file, check if you have the
proper application to
view it first. Information about this may be contained
in the File-Format links below. In case of further problems read
the IDEAS help
file. Note that these files are not on the IDEAS
site. Please be patient as the files may be large.
Publisher Info
Paper provided by University Library of Munich, Germany in its series MPRA Paper with number
155.
Find related papers by JEL classification: C4 - Mathematical and Quantitative Methods - - Econometric and Statistical Methods: Special Topics C81 - Mathematical and Quantitative Methods - - Data Collection and Data Estimation Methodology; Computer Programs - - - Microeconomic Data
This paper has been announced in the following NEP Reports:
References listed on IDEAS Please report citation or reference errors to , or , if you are the registered author of the cited work, log in to your RePEc Author Service profile, click on "citations" and make appropriate adjustments.: