IDEAS home Printed from https://ideas.repec.org/p/arx/papers/2604.23770.html

Bootstrapping with AI/ML-generated labels

Author

Listed:
  • Timothy Christensen
  • Silvia Goncalves
  • Benoit Perron

Abstract

AI/ML methods are increasingly used in economics to generate binary variables (or labels) via classification algorithms. When these generated variables are included as covariates in regressions, even small misclassification errors can induce large biases in OLS estimators and invalidate standard inference. We study whether the bootstrap can correct this bias and deliver valid inference. We first show that a seemingly natural fixed-label bootstrap, which generates data using estimated labels but relies on a corrupted version in estimation, is generally invalid unless a strong independence condition between the latent true labels and other covariates holds. We then propose a coupled-label bootstrap that jointly resamples the true and imputed labels, and show it is valid without this condition. Two finite-sample adjustments further improve coverage: a variance correction for uncertainty in estimated misclassification rates and a Hessian rotation for near-singular designs. We illustrate the methods in simulations and apply them to investigate the relationship between wages and remote work status.

Suggested Citation

  • Timothy Christensen & Silvia Goncalves & Benoit Perron, 2026. "Bootstrapping with AI/ML-generated labels," Papers 2604.23770, arXiv.org.
  • Handle: RePEc:arx:papers:2604.23770
    as

    Download full text from publisher

    File URL: https://arxiv.org/pdf/2604.23770
    File Function: Latest version
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Bound, John & Krueger, Alan B, 1991. "The Extent of Measurement Error in Longitudinal Earnings Data: Do Two Wrongs Make a Right?," Journal of Labor Economics, University of Chicago Press, vol. 9(1), pages 1-24, January.
    2. Jacob Carlson & Melissa Dell, 2025. "A Unifying Framework for Robust and Efficient Inference with Unstructured Data," Papers 2505.00282, arXiv.org, revised Feb 2026.
    3. Gonçalves, Sílvia & Kaffo, Maximilien, 2015. "Bootstrap inference for linear dynamic panel data models with individual fixed effects," Journal of Econometrics, Elsevier, vol. 186(2), pages 407-426.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Laisney, François & Pohlmeier, Winfried & Staat, Matthias, 1991. "Estimation of labour supply functions using panel data: a survey," ZEW Discussion Papers 91-05, ZEW - Leibniz Centre for European Economic Research.
    2. John Abowd & Martha Stinson, 2011. "Estimating Measurement Error in SIPP Annual Job Earnings: A Comparison of Census Bureau Survey and SSA Administrative Data," Working Papers 11-20, Center for Economic Studies, U.S. Census Bureau.
    3. Kemp, Gordon C.R. & Santos Silva, J.M.C., 2012. "Regression towards the mode," Journal of Econometrics, Elsevier, vol. 170(1), pages 92-101.
    4. Liran Einav & Ephraim Leibtag & Aviv Nevo, 2010. "Recording discrepancies in Nielsen Homescan data: Are they present and do they matter?," Quantitative Marketing and Economics (QME), Springer, vol. 8(2), pages 207-239, June.
    5. Steven J. Haider & David S. Loughran, 2008. "The Effect of the Social Security Earnings Test on Male Labor Supply: New Evidence from Survey and Administrative Data," Journal of Human Resources, University of Wisconsin Press, vol. 43(1).
    6. Mittag, Nikolas, 2016. "Correcting for Misreporting of Government Benefits," IZA Discussion Papers 10266, IZA Network @ LISER.
    7. Peter Gottschalk & Minh Huynh, 2010. "Are Earnings Inequality and Mobility Overstated? The Impact of Nonclassical Measurement Error," The Review of Economics and Statistics, MIT Press, vol. 92(2), pages 302-315, May.
    8. Richard Blundell & Luigi Pistaferri & Itay Saporta-Eksten, 2016. "Consumption Inequality and Family Labor Supply," American Economic Review, American Economic Association, vol. 106(2), pages 387-435, February.
    9. Stella Martin & Kevin Stabenow & Mark Trede, 2024. "Measurement Error in Earnings," CQE Working Papers 10824, Center for Quantitative Economics (CQE), University of Muenster.
    10. repec:osf:socarx:6vmws_v1 is not listed on IDEAS
    11. Michael A. Clemens & Claudio Montenegro & Lant Pritchett, 2016. "Bounding the Price Equivalent of Migration Barriers," Growth Lab Working Papers 67, Harvard's Growth Lab.
    12. Manuel Hernandez & Danilo Trupkin, 2021. "Asset maintenance as hidden investment among the poor and rich: Application to housing," Review of Economic Dynamics, Elsevier for the Society for Economic Dynamics, vol. 40, pages 128-145, April.
    13. Bruce Fallick & Michael Lettau & William L. Wascher, 2016. "Downward Nominal Wage Rigidity in the United States during and after the Great Recession," Working Papers (Old Series) 1602, Federal Reserve Bank of Cleveland.
    14. Abdurrahman Aydemir & George J. Borjas, 2011. "Attenuation Bias in Measuring the Wage Impact of Immigration," Journal of Labor Economics, University of Chicago Press, vol. 29(1), pages 69-113, January.
    15. Raj Chetty & Nathaniel Hendren & Patrick Kline & Emmanuel Saez, 2014. "Where is the land of Opportunity? The Geography of Intergenerational Mobility in the United States," The Quarterly Journal of Economics, President and Fellows of Harvard College, vol. 129(4), pages 1553-1623.
    16. Colleen Carey, 2017. "Technological Change and Risk Adjustment: Benefit Design Incentives in Medicare Part D," American Economic Journal: Economic Policy, American Economic Association, vol. 9(1), pages 38-73, February.
    17. Smith, Jennifer C., 2002. "Pay Cuts And Morale : A Test Of Downward Nominal Rigidity," The Warwick Economics Research Paper Series (TWERPS) 649, University of Warwick, Department of Economics.
    18. Antonio F. Galvao & Thomas Parker & Zhijie Xiao, 2024. "Bootstrap Inference for Panel Data Quantile Regression," Journal of Business & Economic Statistics, Taylor & Francis Journals, vol. 42(2), pages 628-639, April.
    19. Hyslop, Dean & Stillman, Steven, 2007. "Youth minimum wage reform and the labour market in New Zealand," Labour Economics, Elsevier, vol. 14(2), pages 201-230, April.
    20. repec:mpr:mprres:6195 is not listed on IDEAS
    21. Christopher R. Bollinger, 2001. "Response Error and the Union Wage Differential," Southern Economic Journal, John Wiley & Sons, vol. 68(1), pages 60-76, July.
    22. Frank A Cowell & Christian Schluter, 1998. "Measuring Income Mobility with Dirty Data (published in Ethnic and Racial Studies, 22(3), May 1999)," CASE Papers 016, Centre for Analysis of Social Exclusion, LSE.

    More about this item

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:arx:papers:2604.23770. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: arXiv administrators (email available below). General contact details of provider: https://arxiv.org/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.