IDEAS home Printed from
MyIDEAS: Log in (now much improved!) to save this paper

A Multiplicative Masking Method for Preserving the Skewness of the Original Micro-records

Listed author(s):
  • Nicolas Ruiz


Registered author(s):

    Masking methods for the safe dissemination of microdata consist of distorting the original data while preserving a pre-defined set of statistical properties in the microdata. For continuous variables, available methodologies rely essentially on matrix masking and in particular on adding noise to the original values, using more or less refined procedures depending on the extent of information that one seeks to preserve. Almost all of these methods make use of the critical assumption that the original datasets follow a normal distribution and/or that the noise has such a distribution. This assumption is, however, restrictive in the sense that few variables follow empirically a Gaussian pattern: the distribution of household income, for example, is positively skewed, and this skewness is essential information that has to be considered and preserved. This paper addresses these issues by presenting a simple multiplicative masking method that preserves skewness of the original data while offering a sufficient level of disclosure risk control. Numerical examples are provided, leading to the suggestion that this method could be well-suited for the dissemination of a broad range of microdata, including those based on administrative and business records. Les méthodes de masquage utilisées pour la diffusion sécurisée des micros données consistent principalement en deux exercices simultanés : la perturbation des valeurs d’origines des données utilisées et la préservation d’un ensemble prédéfini de leurs propriétés statistiques. Pour les variables continues, les méthodes disponibles reposent essentiellement sur l'ajout de bruit aux valeurs d'origine, en utilisant des procédures aux degrés de complexité variant selon l'étendue de l’information que l'on cherche à préserver. Cependant, une caractéristique commune à l’ensemble de ces méthodes est l’utilisation centrale qui est faite de la loi normale, en supposant les données d'origines et/ou les perturbations distribuées selon ce schéma. Cela reste une hypothèse très restrictive dans le sens ou la validité empirique de cette dernière n’est que très rarement vérifiée: la plupart des distributions de revenus observées sont par exemple fortement positivement asymétrique. Cette caractéristique demeure d’ailleurs essentielle et cruciale pour l’analyse économique, et se doit donc d’être préservé. Partant de ce constat, cet article présente une méthodologie simple de masquage multiplicatif préservant l'asymétrie des données d'origine, ce tout en proposant un niveau suffisant de contrôle des risques de divulgation. Cette méthode est illustré au moyen d‘exemples numériques tendant à démontrer l’intérêt de la procédure utilisée à la diffusion d'un large éventail de micro données, y compris celles fondées sur la base de registre administratifs.

    If you experience problems downloading a file, check if you have the proper application to view it first. In case of further problems read the IDEAS help page. Note that these files are not on the IDEAS site. Please be patient as the files may be large.

    File URL:
    Download Restriction: no

    Paper provided by OECD Publishing in its series OECD Statistics Working Papers with number 2011/2.

    in new window

    Date of creation: 23 Feb 2011
    Handle: RePEc:oec:stdaaa:2011/2-en
    Contact details of provider: Postal:
    2 rue Andre Pascal, 75775 Paris Cedex 16

    Phone: 33-(0)-1-45 24 82 00
    Fax: 33-(0)-1-45 24 85 00
    Web page:

    More information through EDIRC

    No references listed on IDEAS
    You can help add them by filling out this form.

    This item is not listed on Wikipedia, on a reading list or among the top items on IDEAS.

    When requesting a correction, please mention this item's handle: RePEc:oec:stdaaa:2011/2-en. See general information about how to correct material in RePEc.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: ()

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If references are entirely missing, you can add them using this form.

    If the full references list an item that is present in RePEc, but the system did not link to it, you can help with this form.

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your profile, as there may be some citations waiting for confirmation.

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    This information is provided to you by IDEAS at the Research Division of the Federal Reserve Bank of St. Louis using RePEc data.