Output-based disclosure control for regressions
Many recent developments in social science research, especially economics, have arisen from the increased availability of confidential government microdata, often in controlled environments. Output-based statistical disclosure control is increasingly important for making effective use of this resource. A central topic is whether the most common analytical tool, multiple regression, is ‘safe’ for release. This is a relatively unexplored field: only a handful of papers have been produced over the last decade and the main reference for practitioners is an unreviewed internal document. This paper analyses the disclosure risks of linear regressions, and demonstrates that, even in the best-case scenario for an intruder, regression results are fundamentally non-disclosive and so come within the class of ‘safe statistics’. It shows that conflicting results in papers reflect institutional perceptions, not statistical matters. It notes that simple rules can both guarantee confidentiality and provide measures of the best approximation to confidential data. It discusses a number of statistical concerns that are shown to be misguided. Finally, it summarises these results to produce formal guidelines for data owners managing controlled environments.
|Date of creation:||09 Jan 2012|
|Date of revision:|
|Contact details of provider:|| Postal: |
Phone: 0117 328 3610
Web page: http://www1.uwe.ac.uk/bl/research/bristoleconomics.aspx
More information through EDIRC
When requesting a correction, please mention this item's handle: RePEc:uwe:wpaper:20121209. See general information about how to correct material in RePEc.
For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: (Felix Ritchie)
If references are entirely missing, you can add them using this form.