IDEAS home Printed from https://ideas.repec.org/p/arx/papers/2008.02246.html
   My bibliography  Save this paper

Applying Data Synthesis for Longitudinal Business Data across Three Countries

Author

Listed:
  • M. Jahangir Alam
  • Benoit Dostie
  • Jorg Drechsler
  • Lars Vilhuber

Abstract

Data on businesses collected by statistical agencies are challenging to protect. Many businesses have unique characteristics, and distributions of employment, sales, and profits are highly skewed. Attackers wishing to conduct identification attacks often have access to much more information than for any individual. As a consequence, most disclosure avoidance mechanisms fail to strike an acceptable balance between usefulness and confidentiality protection. Detailed aggregate statistics by geography or detailed industry classes are rare, public-use microdata on businesses are virtually inexistant, and access to confidential microdata can be burdensome. Synthetic microdata have been proposed as a secure mechanism to publish microdata, as part of a broader discussion of how to provide broader access to such data sets to researchers. In this article, we document an experiment to create analytically valid synthetic data, using the exact same model and methods previously employed for the United States, for data from two different countries: Canada (LEAP) and Germany (BHP). We assess utility and protection, and provide an assessment of the feasibility of extending such an approach in a cost-effective way to other data.

Suggested Citation

  • M. Jahangir Alam & Benoit Dostie & Jorg Drechsler & Lars Vilhuber, 2020. "Applying Data Synthesis for Longitudinal Business Data across Three Countries," Papers 2008.02246, arXiv.org.
  • Handle: RePEc:arx:papers:2008.02246
    as

    Download full text from publisher

    File URL: http://arxiv.org/pdf/2008.02246
    File Function: Latest version
    Download Restriction: no
    ---><---

    Other versions of this item:

    References listed on IDEAS

    as
    1. Woodcock, Simon D. & Benedetto, Gary, 2009. "Distribution-preserving statistical disclosure limitation," Computational Statistics & Data Analysis, Elsevier, vol. 53(12), pages 4228-4242, October.
    2. Satkartar K. Kinney & Jerome P. Reiter & Arnold P. Reznek & Javier Miranda & Ron S. Jarmin & John M. Abowd, 2011. "LBD Synthesis Procedures," CES Technical Notes Series 11-01, Center for Economic Studies, U.S. Census Bureau.
    3. Arellano, Manuel & Bover, Olympia, 1995. "Another look at the instrumental variable estimation of error-components models," Journal of Econometrics, Elsevier, vol. 68(1), pages 29-51, July.
    4. John M. Abowd & Ian M. Schmutte, 2015. "Economic Analysis and Statistical Disclosure Limitation," Brookings Papers on Economic Activity, Economic Studies Program, The Brookings Institution, vol. 50(1 (Spring), pages 221-293.
    5. Jorge Guzman & Scott Stern, 2016. "The State of American Entrepreneurship: New Estimates of the Quality and Quantity of Entrepreneurship for 32 US States, 1988-2014," NBER Working Papers 22095, National Bureau of Economic Research, Inc.
    6. Ron S. Jarmin & Thomas A. Louis & Javier Miranda, 2014. "Expanding The Role Of Synthetic Data At The U.S. Census Bureau," Working Papers 14-10, Center for Economic Studies, U.S. Census Bureau.
    7. Satkartar K. Kinney & Jerome P. Reiter & Javier Miranda, 2014. "Improving The Synthetic Longitudinal Business Database," Working Papers 14-12, Center for Economic Studies, U.S. Census Bureau.
    8. Blundell, Richard & Bond, Stephen, 1998. "Initial conditions and moment restrictions in dynamic panel data models," Journal of Econometrics, Elsevier, vol. 87(1), pages 115-143, August.
    9. Petr Sedláček & Vincent Sterk, 2017. "The Growth Potential of Startups over the Business Cycle," American Economic Review, American Economic Association, vol. 107(10), pages 3182-3210, October.
    10. John M. Abowd & Ian M. Schmutte, 2015. "Economic Analysis and Statistical Disclosure Limitation," Brookings Papers on Economic Activity, Economic Studies Program, The Brookings Institution, vol. 46(1 (Spring), pages 221-293.
    11. Eric Bartelsman & John Haltiwanger & Stefano Scarpetta, 2009. "Measuring and Analyzing Cross-country Differences in Firm Dynamics," NBER Chapters, in: Producer Dynamics: New Evidence from Micro Data, pages 15-76, National Bureau of Economic Research, Inc.
    12. Timothy Dunne & J. Bradford Jensen & Mark J. Roberts, 2009. "Introduction to "Producer Dynamics: New Evidence from Micro Data"," NBER Chapters, in: Producer Dynamics: New Evidence from Micro Data, pages 1-12, National Bureau of Economic Research, Inc.
    13. Jahangir Alam M. & Dostie Benoit & Drechsler Jörg & Vilhuber Lars, 2020. "Applying data synthesis for longitudinal business data across three countries," Statistics in Transition New Series, Polish Statistical Association, vol. 21(4), pages 212-236, August.
    14. John M. Abowd & Bryce E. Stephens & Lars Vilhuber & Fredrik Andersson & Kevin L. McKinney & Marc Roemer & Simon Woodcock, 2009. "The LEHD Infrastructure Files and the Creation of the Quarterly Workforce Indicators," NBER Chapters, in: Producer Dynamics: New Evidence from Micro Data, pages 149-230, National Bureau of Economic Research, Inc.
    15. Drechsler, Jörg & Dundler, Agnes & Bender, Stefan & Rässler, Susanne & Zwick, Thomas, 2007. "A new approach for disclosure control in the IAB Establishment Panel : multiple imputation for a better data access," IAB-Discussion Paper 200711, Institut für Arbeitsmarkt- und Berufsforschung (IAB), Nürnberg [Institute for Employment Research, Nuremberg, Germany].
    16. John M. Abowd & Julia I. Lane, 2004. "New Approaches to Confidentiality Protection Synthetic Data, Remote Access and Research Data Centers," Longitudinal Employer-Household Dynamics Technical Papers 2004-03, Center for Economic Studies, U.S. Census Bureau.
    17. Nowok, Beata & Raab, Gillian M. & Dibben, Chris, 2016. "synthpop: Bespoke Creation of Synthetic Data in R," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 74(i11).
    18. Hethey, Tanja & Schmieder, Johannes F., 2010. "Using worker flows in the analysis of establishment turnover : evidence from German administrative data," FDZ Methodenreport 201006_en, Institut für Arbeitsmarkt- und Berufsforschung (IAB), Nürnberg [Institute for Employment Research, Nuremberg, Germany].
    19. Manuel Arellano & Stephen Bond, 1991. "Some Tests of Specification for Panel Data: Monte Carlo Evidence and an Application to Employment Equations," The Review of Economic Studies, Review of Economic Studies Ltd, vol. 58(2), pages 277-297.
    20. Timothy Dunne & J. Bradford Jensen & Mark J. Roberts, 2009. "Producer Dynamics: New Evidence from Micro Data," NBER Books, National Bureau of Economic Research, Inc, number dunn05-1, March.
    21. Karr, A.F. & Kohnen, C.N. & Oganian, A. & Reiter, J.P. & Sanil, A.P., 2006. "A Framework for Evaluating the Utility of Data Altered to Protect Confidentiality," The American Statistician, American Statistical Association, vol. 60, pages 224-232, August.
    22. Ron S Jarmin & Javier Miranda, 2002. "The Longitudinal Business Database," Working Papers 02-17, Center for Economic Studies, U.S. Census Bureau.
    23. repec:iab:iabfme:201006(en is not listed on IDEAS
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Jahangir Alam M. & Dostie Benoit & Drechsler Jörg & Vilhuber Lars, 2020. "Applying data synthesis for longitudinal business data across three countries," Statistics in Transition New Series, Polish Statistical Association, vol. 21(4), pages 212-236, August.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. William F. Lincoln & Andrew H. McCallum & Michael Siemer, 2019. "The Great Recession and a Missing Generation of Exporters," IMF Economic Review, Palgrave Macmillan;International Monetary Fund, vol. 67(4), pages 703-745, December.
    2. Tran, Hien Thu, 2019. "Institutional quality and market selection in the transition to market economy," Journal of Business Venturing, Elsevier, vol. 34(5), pages 1-1.
    3. William F. Lincoln & Andrew H. McCallum & Michael Siemer, 2017. "The Great Recession and a Missing Generation of Exporters," Finance and Economics Discussion Series 2017-108, Board of Governors of the Federal Reserve System (U.S.).
    4. Emanuele Forlani, 2017. "Irish Firms’ Productivity and Imported Inputs," Manchester School, University of Manchester, vol. 85(6), pages 710-743, December.
    5. William Lincoln & Andrew McCallum & Michael Siemer, 2018. "The Great Recession and a Missing Generation of Exporters," 2018 Meeting Papers 558, Society for Economic Dynamics.
    6. Fritsch, Michael & Changoluisa, Javier, 2017. "New business formation and the productivity of manufacturing incumbents: Effects and mechanisms," Journal of Business Venturing, Elsevier, vol. 32(3), pages 237-259.
    7. Javier Miranda & Lars Vilhuber, 2016. "Using Partially Synthetic Microdata to Protect Sensitive Cells in Business Statistics," Working Papers 16-10, Center for Economic Studies, U.S. Census Bureau.
    8. William F. Lincoln & Andrew H. McCallum & Michael Siemer, 2018. "The Great Recession and a Missing Generation of Exporters," Working Papers 18-33, Center for Economic Studies, U.S. Census Bureau.
    9. Orlic, Edvard & Hashi, Iraj & Hisarciklilar, Mehtap, 2018. "Cross sectoral FDI spillovers and their impact on manufacturing productivity," International Business Review, Elsevier, vol. 27(4), pages 777-796.
    10. John M. Abowd & Ian M. Schmutte & Lars Vilhuber, 2018. "Disclosure Limitation and Confidentiality Protection in Linked Data," Working Papers 18-07, Center for Economic Studies, U.S. Census Bureau.
    11. Daniel Ştefan Armeanu & Georgeta Vintilă & Ştefan Cristian Gherghina, 2017. "Empirical Study towards the Drivers of Sustainable Economic Growth in EU-28 Countries," Sustainability, MDPI, vol. 10(1), pages 1-22, December.
    12. Youngho Kang & Byung-Yeon Kim, 2018. "Immigration and economic growth: do origin and destination matter?," Applied Economics, Taylor & Francis Journals, vol. 50(46), pages 4968-4984, October.
    13. Cho, Seo-young & Vadlamannati, Krishna Chaitanya, 2010. "Compliance for big brothers: An empirical analysis on the impact of the anti-trafficking protocol," University of Göttingen Working Papers in Economics 118, University of Goettingen, Department of Economics.
    14. Vieira, Flávio & MacDonald, Ronald & Damasceno, Aderbal, 2012. "The role of institutions in cross-section income and panel data growth models: A deeper investigation on the weakness and proliferation of instruments," Journal of Comparative Economics, Elsevier, vol. 40(1), pages 127-140.
    15. Hakkala, Katariina & Heyman, Fredrik & Sjöholm, Fredrik, 2007. "Cross-Border Acquisitions, Multinationals and Wage Elasticities," Working Paper Series 709, Research Institute of Industrial Economics.
    16. Tuba DERYA-BASKAN & Eda BALIKÇIOĞLU, 2018. "Firma Bileşenlerinin Halka Açık Perakende Firmalarında Kurumlar Vergisine Etkisi," Sosyoekonomi Journal, Sosyoekonomi Society, issue 26(37).
    17. Kitazawa, Yoshitsugu, 2001. "Exponential regression of dynamic panel data models," Economics Letters, Elsevier, vol. 73(1), pages 7-13, October.
    18. Nuno Carlos LEITÃO & Muhammad SHAHBAZ, 2012. "Migration and Tourism Demand," Theoretical and Applied Economics, Asociatia Generala a Economistilor din Romania - AGER, vol. 0(2(567)), pages 39-48, February.
    19. Alessandra Canepa & Fawaz Khaled, 2018. "Housing, Housing Finance and Credit Risk," IJFS, MDPI, vol. 6(2), pages 1-23, May.
    20. Tahir Andrabi & Jishnu Das & Asim Ijaz Khwaja & Tristan Zajonc, 2011. "Do Value-Added Estimates Add Value? Accounting for Learning Dynamics," American Economic Journal: Applied Economics, American Economic Association, vol. 3(3), pages 29-54, July.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:arx:papers:2008.02246. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: arXiv administrators (email available below). General contact details of provider: http://arxiv.org/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.