IDEAS home Printed from https://ideas.repec.org/p/oec/stdaaa/2011-2-en.html
   My bibliography  Save this paper

A Multiplicative Masking Method for Preserving the Skewness of the Original Micro-records

Author

Listed:
  • Nicolas Ruiz

    (OECD)

Abstract

Masking methods for the safe dissemination of microdata consist of distorting the original data while preserving a pre-defined set of statistical properties in the microdata. For continuous variables, available methodologies rely essentially on matrix masking and in particular on adding noise to the original values, using more or less refined procedures depending on the extent of information that one seeks to preserve. Almost all of these methods make use of the critical assumption that the original datasets follow a normal distribution and/or that the noise has such a distribution. This assumption is, however, restrictive in the sense that few variables follow empirically a Gaussian pattern: the distribution of household income, for example, is positively skewed, and this skewness is essential information that has to be considered and preserved. This paper addresses these issues by presenting a simple multiplicative masking method that preserves skewness of the original data while offering a sufficient level of disclosure risk control. Numerical examples are provided, leading to the suggestion that this method could be well-suited for the dissemination of a broad range of microdata, including those based on administrative and business records. Les méthodes de masquage utilisées pour la diffusion sécurisée des micros données consistent principalement en deux exercices simultanés : la perturbation des valeurs d’origines des données utilisées et la préservation d’un ensemble prédéfini de leurs propriétés statistiques. Pour les variables continues, les méthodes disponibles reposent essentiellement sur l'ajout de bruit aux valeurs d'origine, en utilisant des procédures aux degrés de complexité variant selon l'étendue de l’information que l'on cherche à préserver. Cependant, une caractéristique commune à l’ensemble de ces méthodes est l’utilisation centrale qui est faite de la loi normale, en supposant les données d'origines et/ou les perturbations distribuées selon ce schéma. Cela reste une hypothèse très restrictive dans le sens ou la validité empirique de cette dernière n’est que très rarement vérifiée: la plupart des distributions de revenus observées sont par exemple fortement positivement asymétrique. Cette caractéristique demeure d’ailleurs essentielle et cruciale pour l’analyse économique, et se doit donc d’être préservé. Partant de ce constat, cet article présente une méthodologie simple de masquage multiplicatif préservant l'asymétrie des données d'origine, ce tout en proposant un niveau suffisant de contrôle des risques de divulgation. Cette méthode est illustré au moyen d‘exemples numériques tendant à démontrer l’intérêt de la procédure utilisée à la diffusion d'un large éventail de micro données, y compris celles fondées sur la base de registre administratifs.

Suggested Citation

  • Nicolas Ruiz, 2011. "A Multiplicative Masking Method for Preserving the Skewness of the Original Micro-records," OECD Statistics Working Papers 2011/2, OECD Publishing.
  • Handle: RePEc:oec:stdaaa:2011/2-en
    as

    Download full text from publisher

    File URL: http://dx.doi.org/10.1787/5kgg95pb2tbr-en
    Download Restriction: no

    More about this item

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:oec:stdaaa:2011/2-en. See general information about how to correct material in RePEc.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: (). General contact details of provider: http://edirc.repec.org/data/stoecfr.html .

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service hosted by the Research Division of the Federal Reserve Bank of St. Louis . RePEc uses bibliographic data supplied by the respective publishers.