IDEAS home Printed from https://ideas.repec.org/p/arx/papers/2603.11368.html

Spatially Robust Inference with Predicted and Missing at Random Labels

Author

Listed:
  • Stephen Salerno
  • Zhenke Wu
  • Tyler McCormick

Abstract

When outcome data are expensive or onerous to collect, scientists increasingly substitute predictions from machine learning and AI models for unlabeled cases, a process which has consequences for downstream statistical inference. While recent methods provide valid uncertainty quantification under independent sampling, real-world applications involve missing at random (MAR) labeling and spatial dependence. For inference in this setting, we propose a doubly robust estimator with cross-fit nuisances. We show that cross-fitting induces fold-level correlation that distorts spatial variance estimators, producing unstable or overly conservative confidence intervals. To address this, we propose a jackknife spatial heteroscedasticity and autocorrelation consistent (HAC) variance correction that separates spatial dependence from fold-induced noise. Under standard identification and dependence conditions, the resulting intervals are asymptotically valid. Simulations and benchmark datasets show substantial improvement in finite-sample calibration, particularly under MAR labeling and clustered sampling.

Suggested Citation

  • Stephen Salerno & Zhenke Wu & Tyler McCormick, 2026. "Spatially Robust Inference with Predicted and Missing at Random Labels," Papers 2603.11368, arXiv.org.
  • Handle: RePEc:arx:papers:2603.11368
    as

    Download full text from publisher

    File URL: http://arxiv.org/pdf/2603.11368
    File Function: Latest version
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Jenish, Nazgul & Prucha, Ingmar R., 2009. "Central limit theorems and uniform laws of large numbers for arrays of random fields," Journal of Econometrics, Elsevier, vol. 150(1), pages 86-98, May.
    2. Richard K. Crump & V. Joseph Hotz & Guido W. Imbens & Oscar A. Mitnik, 2009. "Dealing with limited overlap in estimation of average treatment effects," Biometrika, Biometrika Trust, vol. 96(1), pages 187-199.
    3. A. Colin Cameron & Jonah B. Gelbach & Douglas L. Miller, 2011. "Robust Inference With Multiway Clustering," Journal of Business & Economic Statistics, Taylor & Francis Journals, vol. 29(2), pages 238-249, April.
    4. Conley, T. G., 1999. "GMM estimation with cross sectional dependence," Journal of Econometrics, Elsevier, vol. 92(1), pages 1-45, September.
    5. Donglin Zeng & Qingxia Chen, 2010. "Adjustment for Missingness Using Auxiliary Information in Semiparametric Regression," Biometrics, The International Biometric Society, vol. 66(1), pages 115-122, March.
    6. Victor Chernozhukov & Denis Chetverikov & Mert Demirer & Esther Duflo & Christian Hansen & Whitney Newey & James Robins, 2018. "Double/debiased machine learning for treatment and structural parameters," Econometrics Journal, Royal Economic Society, vol. 21(1), pages 1-68, February.
    7. Brian Gilbert & Elizabeth L Ogburn & Abhirup Datta, 2025. "Consistency of common spatial estimators under spatial confounding," Biometrika, Biometrika Trust, vol. 112(2), pages 945-961.
    8. Ying Jin & Dominik Rothenhäusler, 2024. "Tailored inference for finite populations: conditional validity and transfer across distributions," Biometrika, Biometrika Trust, vol. 111(1), pages 215-233.
    9. Bester, C. Alan & Conley, Timothy G. & Hansen, Christian B., 2011. "Inference with dependent data using cluster covariance estimators," Journal of Econometrics, Elsevier, vol. 165(2), pages 137-151.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Pakel, Cavit, 2019. "Bias reduction in nonlinear and dynamic panels in the presence of cross-section dependence," Journal of Econometrics, Elsevier, vol. 213(2), pages 459-492.
    2. Gupta, Abhimanyu, 2018. "Autoregressive spatial spectral estimates," Journal of Econometrics, Elsevier, vol. 203(1), pages 80-95.
    3. Bruno Ferman, 2023. "Inference in difference‐in‐differences: How much should we trust in independent clusters?," Journal of Applied Econometrics, John Wiley & Sons, Ltd., vol. 38(3), pages 358-369, April.
    4. Jorge A. Arroyo, 2025. "Big Wins, Small Net Gains: Direct and Spillover Effects of First Industry Entries in Puerto Rico," Papers 2511.19469, arXiv.org, revised Nov 2025.
    5. J. Hidalgo & M. Schafgans, 2020. "Inference without smoothing for large panels with cross-sectional and temporal dependence," Papers 2006.14409, arXiv.org.
    6. Moscone, F. & Tosetti, Elisa, 2015. "Robust estimation under error cross section dependence," Economics Letters, Elsevier, vol. 133(C), pages 100-104.
    7. Hidalgo, Javier & Schafgans, Marcia, 2021. "Inference without smoothing for large panels with cross-sectional and temporal dependence," Journal of Econometrics, Elsevier, vol. 223(1), pages 125-160.
    8. Hidalgo, Javier & Schafgans, Marcia, 2017. "Inference and testing breaks in large dynamic panels with strong cross sectional dependence," Journal of Econometrics, Elsevier, vol. 196(2), pages 259-274.
    9. Sun, Yu & Yan, Karen X., 2019. "Inference on Difference-in-Differences average treatment effects: A fixed-b approach," Journal of Econometrics, Elsevier, vol. 211(2), pages 560-588.
    10. Hwang, Jungbin, 2021. "Simple and trustworthy cluster-robust GMM inference," Journal of Econometrics, Elsevier, vol. 222(2), pages 993-1023.
    11. Hidalgo, Javier & Schafgans, Marcia, 2017. "Inference and testing breaks in large dynamic panels with strong cross sectional dependence," LSE Research Online Documents on Economics 68839, London School of Economics and Political Science, LSE Library.
    12. Javier Hidalgo & Marcia M Schafgans, 2015. "Inference and Testing Breaks in Large Dynamic Panels with Strong Cross Sectional Dependence," STICERD - Econometrics Paper Series /2015/583, Suntory and Toyota International Centres for Economics and Related Disciplines, LSE.
    13. Timothy Conley & Silvia Gonçalves & Christian Hansen, 2018. "Inference with Dependent Data in Accounting and Finance Applications," Journal of Accounting Research, John Wiley & Sons, Ltd., vol. 56(4), pages 1139-1203, September.
    14. Kim, Min Seong & Sun, Yixiao, 2013. "Heteroskedasticity and spatiotemporal dependence robust inference for linear panel models with fixed effects," Journal of Econometrics, Elsevier, vol. 177(1), pages 85-108.
    15. Hidalgo, Javier & Schafgans, Marcia, 2021. "Inference without smoothing for large panels with cross-sectional and temporal dependence," LSE Research Online Documents on Economics 107426, London School of Economics and Political Science, LSE Library.
    16. James G. MacKinnon & Matthew D. Webb, 2020. "When and How to Deal with Clustered Errors in Regression Models," Working Paper 1421, Economics Department, Queen's University.
    17. Michael Pollmann, 2020. "Causal Inference for Spatial Treatments," Papers 2011.00373, arXiv.org, revised Apr 2026.
    18. Gonzalez, Felipe & Prem, Mounu & von Dessauer, Cristine, 2023. "Empowerment or Indoctrination? Women Centers Under Dictatorship," SocArXiv 64mf9, Center for Open Science.
    19. Alfred Garloff & Carsten Pohl & Norbert Schanne, 2013. "Do small labor market entry cohorts reduce unemployment?," Demographic Research, Max Planck Institute for Demographic Research, Rostock, Germany, vol. 29(15), pages 379-406.
    20. Arne Henningsen & Guy Low & David Wuepper & Tobias Dalhaus & Hugo Storm & Dagim Belay & Stefan Hirsch, 2024. "Estimating Causal Effects with Observational Data: Guidelines for Agricultural and Applied Economists," IFRO Working Paper 2024/03, University of Copenhagen, Department of Food and Resource Economics.

    More about this item

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:arx:papers:2603.11368. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: arXiv administrators (email available below). General contact details of provider: http://arxiv.org/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.