IDEAS home Printed from https://ideas.repec.org/p/arx/papers/2502.10605.html

Batch-Adaptive Causal Annotations

Author

Listed:
  • Ezinne Nwankwo
  • Lauri Goldkind
  • Angela Zhou

Abstract

Estimating the causal effects of interventions is crucial to policy and decision-making, yet outcome data are often missing or subject to non-standard measurement error. While ground-truth outcomes can sometimes be obtained through costly data annotation or follow-up, budget constraints typically allow only a fraction of the dataset to be labeled. We address this challenge by optimizing which data points should be sampled for outcome information in order to improve efficiency in average treatment effect estimation with missing outcomes. We derive a closed-form solution for the optimal batch sampling probability by minimizing the asymptotic variance of a doubly robust estimator for causal inference with missing outcomes. Motivated by our street outreach partners, we extend the framework to costly annotations of unstructured data, such as text or images in healthcare and social services. Across simulated and real-world datasets, including one of outreach interventions in homelessness services, our approach achieves substantially lower mean-squared error and recovers the AIPW estimate with fewer labels than existing baselines. In practice, we show that our method can match confidence intervals obtained with 361 random samples using only 90 optimized samples - saving 75% of the labeling budget.

Suggested Citation

  • Ezinne Nwankwo & Lauri Goldkind & Angela Zhou, 2025. "Batch-Adaptive Causal Annotations," Papers 2502.10605, arXiv.org, revised Apr 2026.
  • Handle: RePEc:arx:papers:2502.10605
    as

    Download full text from publisher

    File URL: http://arxiv.org/pdf/2502.10605
    File Function: Latest version
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Jinyong Hahn & Keisuke Hirano & Dean Karlan, 2011. "Adaptive Experimental Design Using the Propensity Score," Journal of Business & Economic Statistics, Taylor & Francis Journals, vol. 29(1), pages 96-108, January.
    2. Kosuke Imai & Marc Ratkovic, 2014. "Covariate balancing propensity score," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 76(1), pages 243-263, January.
    3. Nathan Kallus & Angela Zhou, 2021. "Minimax-Optimal Policy Learning Under Unobserved Confounding," Management Science, INFORMS, vol. 67(5), pages 2870-2890, May.
    4. Yifan Cui & Hongming Pu & Xu Shi & Wang Miao & Eric Tchetgen Tchetgen, 2024. "Semiparametric Proximal Causal Inference," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 119(546), pages 1348-1359, April.
    5. Susan Athey & Raj Chetty & Guido W. Imbens & Hyunseung Kang, 2019. "The Surrogate Index: Combining Short-Term Proxies to Estimate Long-Term Treatment Effects More Rapidly and Precisely," NBER Working Papers 26463, National Bureau of Economic Research, Inc.
    6. Sylvia Klosin, 2021. "Automatic Double Machine Learning for Continuous Treatment Effects," Papers 2104.10334, arXiv.org.
    7. Guido W. Imbens, 2004. "Nonparametric Estimation of Average Treatment Effects Under Exogeneity: A Review," The Review of Economics and Statistics, MIT Press, vol. 86(1), pages 4-29, February.
    8. Susanne M. Schennach, 2016. "Recent Advances in the Measurement Error Literature," Annual Review of Economics, Annual Reviews, vol. 8(1), pages 341-377, October.
    9. Michela Bia & Martin Huber & Lukáš Lafférs, 2024. "Double Machine Learning for Sample Selection Models," Journal of Business & Economic Statistics, Taylor & Francis Journals, vol. 42(3), pages 958-969, July.
    10. Victor Chernozhukov & Denis Chetverikov & Mert Demirer & Esther Duflo & Christian Hansen & Whitney Newey & James Robins, 2018. "Double/debiased machine learning for treatment and structural parameters," Econometrics Journal, Royal Economic Society, vol. 21(1), pages 1-68, February.
    11. Susan Athey & Stefan Wager, 2021. "Policy Learning With Observational Data," Econometrica, Econometric Society, vol. 89(1), pages 133-161, January.
    12. Shu Yang & Peng Ding, 2020. "Combining Multiple Observational Data Sources to Estimate Causal Effects," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 115(531), pages 1540-1554, July.
    13. Maria Dimakopoulou & Zhimei Ren & Zhengyuan Zhou, 2021. "Online Multi-Armed Bandits with Adaptive Inference," Papers 2102.13202, arXiv.org, revised Jun 2021.
    14. Qingyuan Zhao & Dylan S. Small & Bhaswar B. Bhattacharya, 2019. "Sensitivity analysis for inverse probability weighting estimators via the percentile bootstrap," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 81(4), pages 735-761, September.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Michael Lechner, 2023. "Causal Machine Learning and its use for public policy," Swiss Journal of Economics and Statistics, Springer;Swiss Society of Economics and Statistics, vol. 159(1), pages 1-15, December.
    2. Zequn Jin & Gaoqian Xu & Xi Zheng & Yahong Zhou, 2025. "Policy Learning under Unobserved Confounding: A Robust and Efficient Approach," Papers 2507.20550, arXiv.org.
    3. Ganesh Karapakula, 2023. "Stable Probability Weighting: Large-Sample and Finite-Sample Estimation and Inference Methods for Heterogeneous Causal Effects of Multivalued Treatments Under Limited Overlap," Papers 2301.05703, arXiv.org, revised Jan 2023.
    4. Harsh Parikh & Trang Quynh Nguyen & Elizabeth A. Stuart & Kara E. Rudolph & Caleb H. Miles, 2025. "A Cautionary Tale on Integrating Studies with Disparate Outcome Measures for Causal Inference," Papers 2505.11014, arXiv.org.
    5. Black, Dan A. & Grogger, Jeffrey & Kirchmaier, Tom & Sanders, Koen, 2023. "Criminal charges, risk assessment and violent recidivism in cases of domestic abuse," LSE Research Online Documents on Economics 121374, London School of Economics and Political Science, LSE Library.
    6. Nathan Kallus, 2023. "Treatment Effect Risk: Bounds and Inference," Management Science, INFORMS, vol. 69(8), pages 4579-4590, August.
    7. Achim Ahrens & Alessandra Stampi‐Bombelli & Selina Kurer & Dominik Hangartner, 2024. "Optimal multi‐action treatment allocation: A two‐phase field experiment to boost immigrant naturalization," Journal of Applied Econometrics, John Wiley & Sons, Ltd., vol. 39(7), pages 1379-1395, November.
    8. Masahiro Kato, 2021. "Adaptive Doubly Robust Estimator from Non-stationary Logging Policy under a Convergence of Average Probability," Papers 2102.08975, arXiv.org, revised Mar 2021.
    9. Chunrong Ai & Oliver Linton & Kaiji Motegi & Zheng Zhang, 2021. "A unified framework for efficient estimation of general treatment models," Quantitative Economics, Econometric Society, vol. 12(3), pages 779-816, July.
    10. Dongcheng Zhang & Kunpeng Zhang, 2020. "Weighting-Based Treatment Effect Estimation via Distribution Learning," Papers 2012.13805, arXiv.org, revised May 2023.
    11. Hugo Bodory & Martin Huber & Michael Lechner, 2024. "The Finite Sample Performance of Instrumental Variable-Based Estimators of the Local Average Treatment Effect When Controlling for Covariates," Computational Economics, Springer;Society for Computational Economics, vol. 64(4), pages 2053-2078, October.
    12. Martin Huber, 2019. "An introduction to flexible methods for policy evaluation," Papers 1910.00641, arXiv.org.
    13. Masahiro Kato & Yusuke Kaneko, 2020. "Off-Policy Evaluation of Bandit Algorithm from Dependent Samples under Batch Update Policy," Papers 2010.13554, arXiv.org.
    14. Michael Lechner & Jana Mareckova, 2024. "Comprehensive Causal Machine Learning," Papers 2405.10198, arXiv.org, revised Feb 2025.
    15. Isaac Meza, 2025. "Residual Balancing for Non-Linear Outcome Models in High Dimensions," Papers 2511.00324, arXiv.org.
    16. Guido Imbens & Yiqing Xu, 2024. "Comparing Experimental and Nonexperimental Methods: What Lessons Have We Learned Four Decades After LaLonde (1986)?," Papers 2406.00827, arXiv.org, revised May 2025.
    17. Shuxiao Chen & Bo Zhang, 2021. "Estimating and Improving Dynamic Treatment Regimes With a Time-Varying Instrumental Variable," Papers 2104.07822, arXiv.org.
    18. David Simchi-Levi & Chonghuan Wang, 2026. "Pricing Experimental Design: Causal Effect, Expected Revenue and Tail Risk," Management Science, INFORMS, vol. 72(2), pages 1157-1174, February.
    19. Michael C. Knaus, 2024. "Treatment Effect Estimators as Weighted Outcomes," Papers 2411.11559, arXiv.org, revised Dec 2024.
    20. Asanov, Anastasiya-Mariya & Asanov, Igor & Buenstorf, Guido, 2024. "A low-cost digital first aid tool to reduce psychological distress in refugees: A multi-country randomized controlled trial of self-help online in the first months after the invasion of Ukraine," Social Science & Medicine, Elsevier, vol. 362(C).

    More about this item

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:arx:papers:2502.10605. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: arXiv administrators (email available below). General contact details of provider: https://arxiv.org/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.