IDEAS home Printed from https://ideas.repec.org/a/taf/japsta/v45y2018i1p63-82.html
   My bibliography  Save this article

Simultaneous edit-imputation and disclosure limitation for business establishment data

Author

Listed:
  • Hang J. Kim
  • Jerome P. Reiter
  • Alan F. Karr

Abstract

Business establishment microdata typically are required to satisfy agency-specified edit rules, such as balance equations and linear inequalities. Inevitably some establishments' reported data violate the edit rules. Statistical agencies correct faulty values using a process known as edit-imputation. Business establishment data also must be heavily redacted before being shared with the public; indeed, confidentiality concerns lead many agencies not to share establishment microdata as unrestricted access files. When microdata must be heavily redacted, one approach is to create synthetic data, as done in the U.S. Longitudinal Business Database and the German IAB Establishment Panel. This article presents the first implementation of a fully integrated approach to edit-imputation and data synthesis. We illustrate the approach on data from the U.S. Census of Manufactures and present a variety of evaluations of the utility of the synthetic data. The paper also presents assessments of disclosure risks for several intruder attacks. We find that the synthetic data preserve important distributional features from the post-editing confidential microdata, and have low risks for the various attacks.

Suggested Citation

  • Hang J. Kim & Jerome P. Reiter & Alan F. Karr, 2018. "Simultaneous edit-imputation and disclosure limitation for business establishment data," Journal of Applied Statistics, Taylor & Francis Journals, vol. 45(1), pages 63-82, January.
  • Handle: RePEc:taf:japsta:v:45:y:2018:i:1:p:63-82
    DOI: 10.1080/02664763.2016.1267123
    as

    Download full text from publisher

    File URL: http://hdl.handle.net/10.1080/02664763.2016.1267123
    Download Restriction: Access to full text is restricted to subscribers.

    File URL: https://libkey.io/10.1080/02664763.2016.1267123?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Hang J. Kim & Jerome P. Reiter & Quanli Wang & Lawrence H. Cox & Alan F. Karr, 2014. "Multiple Imputation of Missing or Faulty Values Under Linear Constraints," Journal of Business & Economic Statistics, Taylor & Francis Journals, vol. 32(3), pages 375-386, July.
    2. Chad Syverson, 2011. "What Determines Productivity?," Journal of Economic Literature, American Economic Association, vol. 49(2), pages 326-365, June.
    3. Lucia Foster & John Haltiwanger & Chad Syverson, 2008. "Reallocation, Firm Turnover, and Efficiency: Selection on Productivity or Profitability?," American Economic Review, American Economic Association, vol. 98(1), pages 394-425, March.
    4. Drechsler, Jörg & Reiter, Jerome P., 2011. "An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets," Computational Statistics & Data Analysis, Elsevier, vol. 55(12), pages 3232-3243, December.
    5. Jörg Drechsler, 2012. "New data dissemination approaches in old Europe -- synthetic datasets for a German establishment survey," Journal of Applied Statistics, Taylor & Francis Journals, vol. 39(2), pages 243-265, April.
    6. Chad Syverson, 2004. "Product Substitutability and Productivity Dispersion," The Review of Economics and Statistics, MIT Press, vol. 86(2), pages 534-550, May.
    7. Joseph W. Sakshaug & Trivellore E. Raghunathan, 2014. "Generating synthetic data to produce public-use microdata for small geographic areas based on complex sample survey data with application to the National Health Interview Survey," Journal of Applied Statistics, Taylor & Francis Journals, vol. 41(10), pages 2103-2122, October.
    8. Satkartar K. Kinney & Jerome P. Reiter & Arnold P. Reznek & Javier Miranda & Ron S. Jarmin & John M. Abowd, 2011. "Towards Unrestricted Public Use Business Microdata: The Synthetic Longitudinal Business Database," International Statistical Review, International Statistical Institute, vol. 79(3), pages 362-384, December.
    9. Hang J. Kim & Lawrence H. Cox & Alan F. Karr & Jerome P. Reiter & Quanli Wang, 2015. "Simultaneous Edit-Imputation for Continuous Microdata," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 110(511), pages 987-999, September.
    10. Lawrence H. Cox & Alan F. Karr & Satkartar K. Kinney, 2011. "Risk‐Utility Paradigms for Statistical Disclosure Limitation: How to Think, But Not How to Act," International Statistical Review, International Statistical Institute, vol. 79(2), pages 160-183, August.
    11. Reiter, Jerome P. & Raghunathan, Trivellore E., 2007. "The Multiple Adaptations of Multiple Imputation," Journal of the American Statistical Association, American Statistical Association, vol. 102, pages 1462-1471, December.
    12. Jerome P. Reiter, 2005. "Releasing multiply imputed, synthetic public use microdata: an illustration and empirical study," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 168(1), pages 185-205, January.
    13. Drechsler, Jörg & Reiter, Jerome P., 2010. "Sampling With Synthesis: A New Approach for Releasing Public Use Census Microdata," Journal of the American Statistical Association, American Statistical Association, vol. 105(492), pages 1347-1357.
    14. Hang J. Kim & Lawrence H. Cox & Alan F. Karr & Jerome P. Reiter & Quanli Wang, 2015. "Simultaneous Edit-Imputation for Continuous Microdata," Working Papers 15-44, Center for Economic Studies, U.S. Census Bureau.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Hang J. Kim & Jörg Drechsler & Katherine J. Thompson, 2021. "Synthetic microdata for establishment surveys under informative sampling," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 184(1), pages 255-281, January.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Hang J. Kim & Jörg Drechsler & Katherine J. Thompson, 2021. "Synthetic microdata for establishment surveys under informative sampling," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 184(1), pages 255-281, January.
    2. Drechsler, Jörg & Reiter, Jerome P., 2011. "An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets," Computational Statistics & Data Analysis, Elsevier, vol. 55(12), pages 3232-3243, December.
    3. Jörg Drechsler, 2015. "Multiple Imputation of Multilevel Missing Data—Rigor Versus Simplicity," Journal of Educational and Behavioral Statistics, , vol. 40(1), pages 69-95, February.
    4. Lucia Foster & Cheryl Grim & John Haltiwanger, 2016. "Reallocation in the Great Recession: Cleansing or Not?," Journal of Labor Economics, University of Chicago Press, vol. 34(S1), pages 293-331.
    5. Erik Brynjolfsson & Wang Jin & Kristina McElheran, 2021. "The power of prediction: predictive analytics, workplace complements, and business performance," Business Economics, Palgrave Macmillan;National Association for Business Economics, vol. 56(4), pages 217-239, October.
    6. Hortaçsu, Ali & Syverson, Chad, 2009. "Why Do Firms Own Production Chains?," Working Papers 227, The University of Chicago Booth School of Business, George J. Stigler Center for the Study of the Economy and the State.
    7. Rui Castro & Gian Luca Clementi & Yoonsoo Lee, 2015. "Cross Sectoral Variation in the Volatility of Plant Level Idiosyncratic Shocks," Journal of Industrial Economics, Wiley Blackwell, vol. 63(1), pages 1-29, March.
    8. Michael Greenstone & John A. List & Chad Syverson, 2011. "The Effects of Environmental Regulation on the Competiveness of U.S. Manufacturing," Working Papers 11-03, Center for Economic Studies, U.S. Census Bureau.
    9. Cosmin Ilut & Matthias Kehrig & Martin Schneider, 2018. "Slow to Hire, Quick to Fire: Employment Dynamics with Asymmetric Responses to News," Journal of Political Economy, University of Chicago Press, vol. 126(5), pages 2011-2071.
    10. Sasan Bakhtiari, 2012. "Markets and the non-monotonic relation between productivity and establishment size," Canadian Journal of Economics, Canadian Economics Association, vol. 45(1), pages 345-372, February.
    11. Lionel Fontagné & Gianluca Santoni, 2015. "Firm Level Allocative Inefficiency: Evidence from France," Working Papers hal-01299818, HAL.
    12. Joseph W. Sakshaug & Trivellore E. Raghunathan, 2014. "Generating synthetic microdata to estimate small area statistics in the American Community Survey," Statistics in Transition new series, Główny Urząd Statystyczny (Polska), vol. 15(3), pages 341-368, June.
    13. Figal Garone, Lucas & López Villalba, Paula A. & Maffioli, Alessandro & Ruzzier, Christian A., 2020. "Firm-level productivity in Latin America and the Caribbean," Research in Economics, Elsevier, vol. 74(2), pages 186-192.
    14. Joshua Snoke & Gillian M. Raab & Beata Nowok & Chris Dibben & Aleksandra Slavkovic, 2018. "General and specific utility measures for synthetic data," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 181(3), pages 663-688, June.
    15. Maloney, William F. & Sarrias, Mauricio, 2017. "Convergence to the managerial frontier," Journal of Economic Behavior & Organization, Elsevier, vol. 134(C), pages 284-306.
    16. Izak Atiyas, 2011. "Firm Level Data in The ERF Region: Research Questions, Data Requirements and Possibilities," Working Papers 589, Economic Research Forum, revised 06 Jan 2011.
    17. Segundo Camino‐Mogro & Natalia Bermudez‐Barrezueta, 2021. "Productivity determinants in the construction sector in emerging country: New evidence from Ecuadorian firms," Review of Development Economics, Wiley Blackwell, vol. 25(4), pages 2391-2413, November.
    18. Alessandro Arrighetti & Fabio Landini & Andrea Lasagni, 2021. "Swimming upstream throughout the turmoil: Evidence on firm growth during the great recession," Scottish Journal of Political Economy, Scottish Economic Society, vol. 68(3), pages 322-344, July.
    19. Amitabh Chandra & Amy Finkelstein & Adam Sacarny & Chad Syverson, 2016. "Productivity Dispersion in Medicine and Manufacturing," American Economic Review, American Economic Association, vol. 106(5), pages 99-103, May.
    20. Paul L. E. Grieco & Shengyu Li & Hongsong Zhang, 2022. "Input prices, productivity, and trade dynamics: long‐run effects of liberalization on Chinese paint manufacturers," RAND Journal of Economics, RAND Corporation, vol. 53(3), pages 516-560, September.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:taf:japsta:v:45:y:2018:i:1:p:63-82. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Chris Longhurst (email available below). General contact details of provider: http://www.tandfonline.com/CJAS20 .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.