IDEAS home Printed from https://ideas.repec.org/p/arx/papers/2510.05545.html
   My bibliography  Save this paper

Can language models boost the power of randomized experiments without statistical bias?

Author

Listed:
  • Xinrui Ruan
  • Xinwei Ma
  • Yingfei Wang
  • Waverly Wei
  • Jingshen Wang

Abstract

Randomized experiments or randomized controlled trials (RCTs) are gold standards for causal inference, yet cost and sample-size constraints limit power. Meanwhile, modern RCTs routinely collect rich, unstructured data that are highly prognostic of outcomes but rarely used in causal analyses. We introduce CALM (Causal Analysis leveraging Language Models), a statistical framework that integrates large language models (LLMs) predictions with established causal estimators to increase precision while preserving statistical validity. CALM treats LLM outputs as auxiliary prognostic information and corrects their potential bias via a heterogeneous calibration step that residualizes and optimally reweights predictions. We prove that CALM remains consistent even when LLM predictions are biased and achieves efficiency gains over augmented inverse probability weighting estimators for various causal effects. In particular, CALM develops a few-shot variant that aggregates predictions across randomly sampled demonstration sets. The resulting U-statistic-like predictor restores i.i.d. structure and also mitigates prompt-selection variability. Empirically, in simulations calibrated to a mobile-app depression RCT, CALM delivers lower variance relative to other benchmarking methods, is effective in zero- and few-shot settings, and remains stable across prompt designs. By principled use of LLMs to harness unstructured data and external knowledge learned during pretraining, CALM provides a practical path to more precise causal analyses in RCTs.

Suggested Citation

  • Xinrui Ruan & Xinwei Ma & Yingfei Wang & Waverly Wei & Jingshen Wang, 2025. "Can language models boost the power of randomized experiments without statistical bias?," Papers 2510.05545, arXiv.org.
  • Handle: RePEc:arx:papers:2510.05545
    as

    Download full text from publisher

    File URL: http://arxiv.org/pdf/2510.05545
    File Function: Latest version
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Victor Chernozhukov & Sokbae Lee & Adam M. Rosen, 2013. "Intersection Bounds: Estimation and Inference," Econometrica, Econometric Society, vol. 81(2), pages 667-737, March.
    2. Victor Chernozhukov & Denis Chetverikov & Kengo Kato, 2012. "Gaussian approximations and multiplier bootstrap for maxima of sums of high-dimensional random vectors," Papers 1212.6906, arXiv.org, revised Jan 2018.
    3. Vaart,A. W. van der, 2000. "Asymptotic Statistics," Cambridge Books, Cambridge University Press, number 9780521784504, November.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Victor Chernozhukov & Denis Chetverikov & Kengo Kato, 2013. "Testing Many Moment Inequalities," CeMMAP working papers 65/13, Institute for Fiscal Studies.
    2. Matias D. Cattaneo & Richard K. Crump & Weining Wang, 2022. "Beta-Sorted Portfolios," Papers 2208.10974, arXiv.org, revised Nov 2024.
    3. Arun G. Chandrasekhar & Victor Chernozhukov & Francesca Molinari & Paul Schrimpf, 2019. "Best Linear Approximations to Set Identified Functions: With an Application to the Gender Wage Gap," NBER Working Papers 25593, National Bureau of Economic Research, Inc.
    4. Matthew A. Masten & Alexandre Poirier, 2020. "Inference on breakdown frontiers," Quantitative Economics, Econometric Society, vol. 11(1), pages 41-111, January.
    5. Demian Pouzo, 2014. "Bootstrap Consistency for Quadratic Forms of Sample Averages with Increasing Dimension," Papers 1411.2701, arXiv.org, revised Aug 2015.
    6. Sokbae Lee & Ryo Okui & Yoon†Jae Whang, 2017. "Doubly robust uniform confidence band for the conditional average treatment effect function," Journal of Applied Econometrics, John Wiley & Sons, Ltd., vol. 32(7), pages 1207-1225, November.
    7. Chetverikov, Denis & Wilhelm, Daniel & Kim, Dongwoo, 2021. "An Adaptive Test Of Stochastic Monotonicity," Econometric Theory, Cambridge University Press, vol. 37(3), pages 495-536, June.
    8. Peter Horvath & Jia Li & Zhipeng Liao & Andrew J. Patton, 2022. "A consistent specification test for dynamic quantile models," Quantitative Economics, Econometric Society, vol. 13(1), pages 125-151, January.
    9. Armstrong, Timothy B. & Chan, Hock Peng, 2016. "Multiscale adaptive inference on conditional moment inequalities," Journal of Econometrics, Elsevier, vol. 194(1), pages 24-43.
    10. Christoph Rothe, 2012. "Partial Distributional Policy Effects," Econometrica, Econometric Society, vol. 80(5), pages 2269-2301, September.
    11. Wenlong Ji & Lihua Lei & Asher Spector, 2023. "Model-Agnostic Covariate-Assisted Inference on Partially Identified Causal Effects," Papers 2310.08115, arXiv.org, revised Nov 2024.
    12. Li, Jia & Liao, Zhipeng, 2020. "Uniform nonparametric inference for time series," Journal of Econometrics, Elsevier, vol. 219(1), pages 38-51.
    13. Rothe, Christoph, 2009. "Unconditional Partial Effects of Binary Covariates," TSE Working Papers 09-79, Toulouse School of Economics (TSE).
    14. Clément de Chaisemartin & Diego Ciccia & Xavier D’Haultfoeuille & Felix Knau, 2024. "Two-way Fixed Effects and Differences-in-Differences in Heterogeneous Adoption Designs without Stayers," Working Papers 2025-01, Center for Research in Economics and Statistics.
    15. X D’Haultfœuille & C Gaillac & A Maurel, 2025. "Partially Linear Models under Data Combination," The Review of Economic Studies, Review of Economic Studies Ltd, vol. 92(1), pages 238-267.
    16. Xu, Kai & Cheng, Qing, 2024. "Test of conditional independence in factor models via Hilbert–Schmidt independence criterion," Journal of Multivariate Analysis, Elsevier, vol. 199(C).
    17. Arun Chandrasekhar & Victor Chernozhukov & Francesca Molinari & Paul Schrimpf, 2012. "Inference for best linear approximations to set identified functions," CeMMAP working papers 43/12, Institute for Fiscal Studies.
    18. Sung Jae Jun & Yoonseok Lee & Youngki Shin, 2016. "Treatment Effects With Unobserved Heterogeneity: A Set Identification Approach," Journal of Business & Economic Statistics, Taylor & Francis Journals, vol. 34(2), pages 302-311, April.
    19. Wei, Waverly & Zhou, Yuqing & Zheng, Zeyu & Wang, Jingshen, 2024. "Inference on the best policies with many covariates," Journal of Econometrics, Elsevier, vol. 239(2).
    20. Victor Chernozhukov & Denis Chetverikov & Kengo Kato & Yuta Koike, 2022. "High-dimensional Data Bootstrap," Papers 2205.09691, arXiv.org.

    More about this item

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:arx:papers:2510.05545. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: arXiv administrators (email available below). General contact details of provider: http://arxiv.org/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.