IDEAS home Printed from https://ideas.repec.org/a/vrs/offsta/v30y2014i4p19n13.html
   My bibliography  Save this article

Data Smearing: An Approach to Disclosure Limitation for Tabular Data

Author

Listed:
  • Toth Daniell

    (Bureau of Labor Statistics, Office of Survey Methods Research, Suite 1950, Washington, DC 20212, U.S.A.)

Abstract

Statistical agencies often collect sensitive data for release to the public at aggregated levels in the form of tables. To protect confidential data, some cells are suppressed in the publicly released data. One problem with this method is that many cells of interest must be suppressed in order to protect a much smaller number of sensitive cells. Another problem is that the covariates used to aggregate and level of aggregation must be fixed before the data is released. Both of these restrictions can severely limit the utility of the data. We propose a new disclosure limitation method that replaces the full set of microdata with synthetic data for use in producing released data in tabular form. This synthetic data set is obtained by replacing each unit’s values with a weighted average of sampled values from the surrounding area. The synthetic data is produced in a way to give asymptotically unbiased estimates for aggregate cells as the number of units in the cell increases. The method is applied to the U.S. Bureau of Labor Statistics Quarterly Census of Employment and Wages data, which is released to the public quarterly in tabular form and aggregated across varying scales of time, area, and economic sector.

Suggested Citation

  • Toth Daniell, 2014. "Data Smearing: An Approach to Disclosure Limitation for Tabular Data," Journal of Official Statistics, Sciendo, vol. 30(4), pages 1-19, December.
  • Handle: RePEc:vrs:offsta:v:30:y:2014:i:4:p:19:n:13
    DOI: 10.2478/jos-2014-0050
    as

    Download full text from publisher

    File URL: https://doi.org/10.2478/jos-2014-0050
    Download Restriction: no

    File URL: https://libkey.io/10.2478/jos-2014-0050?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Holan, Scott H. & Toth, Daniell & Ferreira, Marco A. R. & Karr, Alan F., 2010. "Bayesian Multiscale Multiple Imputation With Implications for Data Confidentiality," Journal of the American Statistical Association, American Statistical Association, vol. 105(490), pages 564-577.
    2. Wasserman, Larry & Zhou, Shuheng, 2010. "A Statistical Framework for Differential Privacy," Journal of the American Statistical Association, American Statistical Association, vol. 105(489), pages 375-389.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Harrison Quick, 2021. "Generating Poisson‐distributed differentially private synthetic data," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 184(3), pages 1093-1108, July.
    2. John M. Abowd & Ian M. Schmutte & William Sexton & Lars Vilhuber, 2019. "Suboptimal Provision of Privacy and Statistical Accuracy When They are Public Goods," Papers 1906.09353, arXiv.org.
    3. Claire McKay Bowen & Fang Liu & Bingyue Su, 2021. "Differentially private data release via statistical election to partition sequentially," METRON, Springer;Sapienza Università di Roma, vol. 79(1), pages 1-31, April.
    4. Ron S. Jarmin & John M. Abowd & Robert Ashmead & Ryan Cumings-Menon & Nathan Goldschlag & Michael B. Hawes & Sallie Ann Keller & Daniel Kifer & Philip Leclerc & Jerome P. Reiter & Rolando A. Rodrígue, 2023. "An in-depth examination of requirements for disclosure risk assessment," Proceedings of the National Academy of Sciences, Proceedings of the National Academy of Sciences, vol. 120(43), pages 2220558120-, October.
    5. Raj Chetty & John N. Friedman, 2019. "A Practical Method to Reduce Privacy Loss When Disclosing Statistics Based on Small Samples," AEA Papers and Proceedings, American Economic Association, vol. 109, pages 414-420, May.
    6. John M. Abowd & Robert Ashmead & Ryan Cumings-Menon & Simson Garfinkel & Micah Heineck & Christine Heiss & Robert Johns & Daniel Kifer & Philip Leclerc & Ashwin Machanavajjhala & Brett Moran & William, 2022. "The 2020 Census Disclosure Avoidance System TopDown Algorithm," Papers 2204.08986, arXiv.org.
    7. Katherine B. Coffman & Lucas C. Coffman & Keith M. Marzilli Ericson, 2017. "The Size of the LGBT Population and the Magnitude of Antigay Sentiment Are Substantially Underestimated," Management Science, INFORMS, vol. 63(10), pages 3168-3186, October.
    8. Ori Heffetz & Katrina Ligett, 2014. "Privacy and Data-Based Research," Journal of Economic Perspectives, American Economic Association, vol. 28(2), pages 75-98, Spring.
    9. Jinshuo Dong & Aaron Roth & Weijie J. Su, 2022. "Gaussian differential privacy," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 84(1), pages 3-37, February.
    10. Soumya Mukherjee & Aratrika Mustafi & Aleksandra Slavkovi'c & Lars Vilhuber, 2023. "Assessing Utility of Differential Privacy for RCTs," Papers 2309.14581, arXiv.org.
    11. Erik Dietzenbacher & Manfred Lenzen & Bart Los & Dabo Guan & Michael L. Lahr & Ferran Sancho & Sangwon Suh & Cuihong Yang, 2013. "Input--Output Analysis: The Next 25 Years," Economic Systems Research, Taylor & Francis Journals, vol. 25(4), pages 369-389, December.
    12. Jing Lei & Anne‐Sophie Charest & Aleksandra Slavkovic & Adam Smith & Stephen Fienberg, 2018. "Differentially private model selection with penalized and constrained likelihood," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 181(3), pages 609-633, June.
    13. Ryan Cumings-Menon, 2022. "Differentially Private Estimation via Statistical Depth," Papers 2207.12602, arXiv.org.
    14. Sara Zapata‐Marin & Alexandra M. Schmidt & Scott Weichenthal & Eric Lavigne, 2023. "Modeling temporally misaligned data across space: The case of total pollen concentration in Toronto," Environmetrics, John Wiley & Sons, Ltd., vol. 34(8), December.
    15. Chongliang Luo & Md. Nazmul Islam & Natalie E. Sheils & John Buresh & Jenna Reps & Martijn J. Schuemie & Patrick B. Ryan & Mackenzie Edmondson & Rui Duan & Jiayi Tong & Arielle Marks-Anglin & Jiang Bi, 2022. "DLMM as a lossless one-shot algorithm for collaborative multi-site distributed linear mixed models," Nature Communications, Nature, vol. 13(1), pages 1-10, December.
    16. Vishesh Karwa & Pavel N. Krivitsky & Aleksandra B. Slavković, 2017. "Sharing social network data: differentially private estimation of exponential family random-graph models," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 66(3), pages 481-500, April.
    17. Bi, Xuan & Shen, Xiaotong, 2023. "Distribution-invariant differential privacy," Journal of Econometrics, Elsevier, vol. 235(2), pages 444-453.
    18. Jinshuo Dong & Aaron Roth & Weijie J. Su, 2022. "Authors’ reply to the Discussion of ‘Gaussian Differential Privacy’ by Dong et al," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 84(1), pages 50-54, February.
    19. Lalanne, Clément & Gadat, Sébastien, 2024. "Privately Learning Smooth Distributions on the Hypercube by Projections," TSE Working Papers 24-1505, Toulouse School of Economics (TSE).
    20. John M. Abowd & Ian M. Schmutte & Lars Vilhuber, 2018. "Disclosure Limitation and Confidentiality Protection in Linked Data," Working Papers 18-07, Center for Economic Studies, U.S. Census Bureau.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:vrs:offsta:v:30:y:2014:i:4:p:19:n:13. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Peter Golla (email available below). General contact details of provider: https://www.sciendo.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.