Can language models boost the power of randomized experiments without statistical bias?

Can language models boost the power of randomized experiments without statistical bias?

Author

Listed:

Xinrui Ruan
Xinwei Ma
Yingfei Wang
Waverly Wei
Jingshen Wang

Abstract

Randomized experiments or randomized controlled trials (RCTs) are gold standards for causal inference, yet cost and sample-size constraints limit power. We introduce CALM (Causal Analysis leveraging Language Models), a statistical framework that integrates large language models (LLMs) generated insights of RCTs with established causal estimators to increase precision while preserving statistical validity. In particular, CALM treats LLM-generated outputs as auxiliary prognostic information and corrects their potential bias via a heterogeneous calibration step that residualizes and optimally reweights predictions. We prove that CALM remains consistent even when LLM predictions are biased and achieves efficiency gains over augmented inverse probability weighting estimators for various causal effects. In particular, CALM develops a few-shot variant that aggregates predictions across randomly sampled demonstration sets. The resulting U-statistic-like predictor restores i.i.d. structure and also mitigates prompt-selection variability. Empirically, in simulations calibrated to a mobile-app depression RCT, CALM delivers lower variance relative to other benchmarking methods, is effective in zero- and few-shot settings, and remains stable across prompt designs. By principled use of LLMs to harness unstructured data and external knowledge learned during pretraining, CALM provides a practical path to more precise causal analyses in RCTs.

Suggested Citation

Xinrui Ruan & Xinwei Ma & Yingfei Wang & Waverly Wei & Jingshen Wang, 2025. "Can language models boost the power of randomized experiments without statistical bias?," Papers 2510.05545, arXiv.org, revised Dec 2025.

Handle: RePEc:arx:papers:2510.05545

Download full text from publisher

References listed on IDEAS

Victor Chernozhukov & Denis Chetverikov & Kengo Kato, 2012. "Gaussian approximations and multiplier bootstrap for maxima of sums of high-dimensional random vectors," Papers 1212.6906, arXiv.org, revised Jan 2018.
- Victor Chernozhukov & Denis Chetverikov & Kengo Kato, 2013. "Gaussian approximations and multiplier bootstrap for maxima of sums of high-dimensional random vectors," CeMMAP working papers 76/13, Institute for Fiscal Studies.
- Victor Chernozhukov & Denis Chetverikov & Kengo Kato, 2013. "Gaussian approximations and multiplier bootstrap for maxima of sums of high-dimensional random vectors," CeMMAP working papers CWP76/13, Centre for Microdata Methods and Practice, Institute for Fiscal Studies.
Vaart,A. W. van der, 2000. "Asymptotic Statistics," Cambridge Books, Cambridge University Press, number 9780521784504, January.
Victor Chernozhukov & Sokbae Lee & Adam M. Rosen, 2013. "Intersection Bounds: Estimation and Inference," Econometrica, Econometric Society, vol. 81(2), pages 667-737, March.
- Victor Chernozhukov & Sokbae (Simon) Lee & Adam Rosen, 2009. "Intersection Bounds: estimation and inference," CeMMAP working papers 19/09, Institute for Fiscal Studies.
- Victor Chernozhukov & Sokbae (Simon) Lee & Adam Rosen, 2012. "Intersection bounds: estimation and inference," CeMMAP working papers CWP33/12, Centre for Microdata Methods and Practice, Institute for Fiscal Studies.
- Victor Chernozhukov & Sokbae (Simon) Lee & Adam Rosen, 2009. "Intersection Bounds: estimation and inference," CeMMAP working papers CWP19/09, Centre for Microdata Methods and Practice, Institute for Fiscal Studies.
- Victor Chernozhukov & Sokbae (Simon) Lee & Adam Rosen, 2012. "Intersection bounds: estimation and inference," CeMMAP working papers 33/12, Institute for Fiscal Studies.
- Victor Chernozhukov & Sokbae (Simon) Lee & Adam Rosen, 2011. "Intersection bounds: estimation and inference," CeMMAP working papers 34/11, Institute for Fiscal Studies.
- Victor Chernozhukov & Sokbae (Simon) Lee & Adam Rosen, 2011. "Intersection bounds: estimation and inference," CeMMAP working papers CWP34/11, Centre for Microdata Methods and Practice, Institute for Fiscal Studies.

Full references (including those not matched with items on IDEAS)

Most related items

These are the items that most often cite the same works as this one and are cited by the same works as this one.

Victor Chernozhukov & Denis Chetverikov & Kengo Kato, 2013. "Testing Many Moment Inequalities," CeMMAP working papers 65/13, Institute for Fiscal Studies.
- Victor Chernozhukov & Denis Chetverikov & Kengo Kato, 2016. "Testing many moment inequalities," CeMMAP working papers CWP42/16, Centre for Microdata Methods and Practice, Institute for Fiscal Studies.
- Victor Chernozhukov & Denis Chetverikov & Kengo Kato, 2016. "Testing many moment inequalities," CeMMAP working papers 42/16, Institute for Fiscal Studies.
- Victor Chernozhukov & Denis Chetverikov & Kengo Kato, 2014. "Testing many moment inequalities," CeMMAP working papers 52/14, Institute for Fiscal Studies.
- Victor Chernozhukov & Denis Chetverikov & Kengo Kato, 2014. "Testing many moment inequalities," CeMMAP working papers CWP52/14, Centre for Microdata Methods and Practice, Institute for Fiscal Studies.
- Victor Chernozhukov & Denis Chetverikov & Kengo Kato, 2013. "Testing Many Moment Inequalities," CeMMAP working papers CWP65/13, Centre for Microdata Methods and Practice, Institute for Fiscal Studies.
Li, Jia & Liao, Zhipeng & Zhou, Wenyu, 2025. "A general test for functional inequalities," Journal of Econometrics, Elsevier, vol. 251(C).
Arun G. Chandrasekhar & Victor Chernozhukov & Francesca Molinari & Paul Schrimpf, 2019. "Best Linear Approximations to Set Identified Functions: With an Application to the Gender Wage Gap," NBER Working Papers 25593, National Bureau of Economic Research, Inc.
- Arun Chandrasekhar & Victor Chernozhukov & Francesca Molinari & Paul Schrimpf, 2019. "Best linear approximations to set identified functions: with an application to the gender wage gap," CeMMAP working papers CWP09/19, Centre for Microdata Methods and Practice, Institute for Fiscal Studies.
- Arun Chandrasekhar & Victor Chernozhukov & Francesca Molinari & Paul Schrimpf, 2019. "Best linear approximations to set identified functions: with an application to the gender wage gap," CeMMAP working papers 09/19, Institute for Fiscal Studies.
Demian Pouzo, 2014. "Bootstrap Consistency for Quadratic Forms of Sample Averages with Increasing Dimension," Papers 1411.2701, arXiv.org, revised Aug 2015.
Sokbae Lee & Ryo Okui & Yoonâ€ Jae Whang, 2017. "Doubly robust uniform confidence band for the conditional average treatment effect function," Journal of Applied Econometrics, John Wiley & Sons, Ltd., vol. 32(7), pages 1207-1225, November.
- Sokbae (Simon) Lee & Ryo Okui & Yoon-Jae Whang, 2016. "Doubly robust uniform confidence band for the conditional average treatment effect function," CeMMAP working papers CWP03/16, Centre for Microdata Methods and Practice, Institute for Fiscal Studies.
- Lee, Sokbae & Okui, Ryo & Whang, Yoon-Jae, 2017. "Doubly robust uniform confidence band for the conditional average treatment effect function," LSE Research Online Documents on Economics 86852, London School of Economics and Political Science, LSE Library.
- Sokbae Lee & Ryo Okui & Yoon-Jae Whang, 2016. "Doubly Robust Uniform Confidence Band For The Conditional Average Treatment Effect Function," KIER Working Papers 931, Kyoto University, Institute of Economic Research.
- Sokbae Lee & Ryo Okui & Yoon-Jae Whang, 2016. "Doubly Robust Uniform Confidence Band for the Conditional Average Treatment Effect Function," Papers 1601.02801, arXiv.org, revised Oct 2016.
- Sokbae (Simon) Lee & Ryo Okui & Yoon-Jae Whang, 2016. "Doubly robust uniform confidence band for the conditional average treatment effect function," CeMMAP working papers 03/16, Institute for Fiscal Studies.
Chetverikov, Denis & Wilhelm, Daniel & Kim, Dongwoo, 2021. "An Adaptive Test Of Stochastic Monotonicity," Econometric Theory, Cambridge University Press, vol. 37(3), pages 495-536, June.
- Denis Chetverikov & Daniel Wilhelm & Dongwoo Kim, 2018. "An adaptive test of stochastic monotonicity," CeMMAP working papers CWP24/18, Centre for Microdata Methods and Practice, Institute for Fiscal Studies.
- Denis Chetverikov & Daniel Wilhelm & Dongwoo Kim, 2020. "An Adaptive Test of Stochastic Monotonicity," CeMMAP working papers CWP17/20, Centre for Microdata Methods and Practice, Institute for Fiscal Studies.
- Denis Chetverikov & Daniel Wilhelm & Dongwoo Kim, 2019. "An adaptive test of stochastic monotonicity," CeMMAP working papers CWP49/19, Centre for Microdata Methods and Practice, Institute for Fiscal Studies.
Peter Horvath & Jia Li & Zhipeng Liao & Andrew J. Patton, 2022. "A consistent specification test for dynamic quantile models," Quantitative Economics, Econometric Society, vol. 13(1), pages 125-151, January.
Armstrong, Timothy B. & Chan, Hock Peng, 2016. "Multiscale adaptive inference on conditional moment inequalities," Journal of Econometrics, Elsevier, vol. 194(1), pages 24-43.
- Timothy B. Armstrong & Hock Peng Chan, 2013. "Multiscale Adaptive Inference on Conditional Moment Inequalities," Cowles Foundation Discussion Papers 1885R, Cowles Foundation for Research in Economics, Yale University, revised Oct 2014.
- Timothy B. Armstrong & Hock Peng Chan, 2013. "Multiscale Adaptive Inference on Conditional Moment Inequalities," Cowles Foundation Discussion Papers 1885R, Cowles Foundation for Research in Economics, Yale University, revised Dec 2015.
- Timothy B. Armstrong & Hock Peng Chan, 2013. "Multiscale Adaptive Inference on Conditional Moment Inequalities," Cowles Foundation Discussion Papers 1885, Cowles Foundation for Research in Economics, Yale University.
Christoph Rothe, 2012. "Partial Distributional Policy Effects," Econometrica, Econometric Society, vol. 80(5), pages 2269-2301, September.
- Rothe, Christoph, 2011. "Partial Distributional Policy Effects," IZA Discussion Papers 6076, IZA Network @ LISER.
Wenlong Ji & Lihua Lei & Asher Spector, 2023. "Model-Agnostic Covariate-Assisted Inference on Partially Identified Causal Effects," Papers 2310.08115, arXiv.org, revised Nov 2024.
Li, Jia & Liao, Zhipeng, 2020. "Uniform nonparametric inference for time series," Journal of Econometrics, Elsevier, vol. 219(1), pages 38-51.
Rothe, Christoph, 2009. "Unconditional Partial Effects of Binary Covariates," TSE Working Papers 09-79, Toulouse School of Economics (TSE).
Matias D. Cattaneo & Richard K. Crump & Weining Wang, 2022. "Beta-Sorted Portfolios," Papers 2208.10974, arXiv.org, revised Nov 2024.
- Matias D. Cattaneo & Richard K. Crump & Weining Wang, 2023. "Beta-Sorted Portfolios," Staff Reports 1068, Federal Reserve Bank of New York.
- Matias Cattaneo & Richard K. Crump & Weining Wang, 2024. "Beta-sorted portfolios," CeMMAP working papers 20/24, Institute for Fiscal Studies.
Matthew A. Masten & Alexandre Poirier, 2020. "Inference on breakdown frontiers," Quantitative Economics, Econometric Society, vol. 11(1), pages 41-111, January.
- Matthew Masten & Alexandre Poirier, 2017. "Inference on breakdown frontiers," CeMMAP working papers 20/17, Institute for Fiscal Studies.
- Matthew A. Masten & Alexandre Poirier, 2017. "Inference on Breakdown Frontiers," Papers 1705.04765, arXiv.org, revised Feb 2019.
- Matthew Masten & Alexandre Poirier, 2017. "Inference on breakdown frontiers," CeMMAP working papers CWP20/17, Centre for Microdata Methods and Practice, Institute for Fiscal Studies.
Clément de Chaisemartin & Diego Ciccia & Xavier D’Haultfoeuille & Felix Knau, 2024. "Two-way Fixed Effects and Differences-in-Differences in Heterogeneous Adoption Designs without Stayers," Working Papers 2025-01, Center for Research in Economics and Statistics.
- Clément de Chaisemartin & Diego Ciccia & Xavier d'Haultfoeuille & Felix Knau, 2025. "Two-way Fixed Effects and Differences-in-Differences in Heterogeneous Adoption Designs without Stayers," Working Papers hal-03873937, HAL.
- Clément de Chaisemartin & Diego Ciccia & Xavier d'Haultfoeuille & Felix Knau, 2025. "Two-way Fixed Effects and Differences-in-Differences in Heterogeneous Adoption Designs without Stayers," Sciences Po Economics Publications (main) hal-03873937, HAL.
Xu, Kai & Cheng, Qing, 2024. "Test of conditional independence in factor models via Hilbert–Schmidt independence criterion," Journal of Multivariate Analysis, Elsevier, vol. 199(C).
Arun Chandrasekhar & Victor Chernozhukov & Francesca Molinari & Paul Schrimpf, 2012. "Inference for best linear approximations to set identified functions," CeMMAP working papers 43/12, Institute for Fiscal Studies.
- Arun Chandrasekhar & Victor Chernozhukov & Francesca Molinari & Paul Schrimpf, 2012. "Inference for best linear approximations to set identified functions," CeMMAP working papers CWP43/12, Centre for Microdata Methods and Practice, Institute for Fiscal Studies.
Wei, Waverly & Zhou, Yuqing & Zheng, Zeyu & Wang, Jingshen, 2024. "Inference on the best policies with many covariates," Journal of Econometrics, Elsevier, vol. 239(2).
Sung Jae Jun & Yoonseok Lee & Youngki Shin, 2016. "Treatment Effects With Unobserved Heterogeneity: A Set Identification Approach," Journal of Business & Economic Statistics, Taylor & Francis Journals, vol. 34(2), pages 302-311, April.
- Yoonseok Lee & Sung Jae Jun & Youngki Shin, 2014. "Treatment Effects with Unobserved Heterogeneity: A Set Identification Approach," Center for Policy Research Working Papers 169, Center for Policy Research, Maxwell School, Syracuse University.
Victor Chernozhukov & Denis Chetverikov & Kengo Kato & Yuta Koike, 2022. "High-dimensional Data Bootstrap," Papers 2205.09691, arXiv.org.

More about this item

NEP fields

This paper has been announced in the following NEP Reports:

NEP-AIN-2025-10-13 (Artificial Intelligence)
NEP-CMP-2025-10-13 (Computational Economics)
NEP-EXP-2025-10-13 (Experimental Economics)

Statistics

Access and download statistics

Corrections

All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:arx:papers:2510.05545. See general information about how to correct material in RePEc.

If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: arXiv administrators (email available below). General contact details of provider: http://arxiv.org/ .

Please note that corrections may take a couple of weeks to filter through the various RePEc services.

IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.

Browse Econ Literature

More features

Can language models boost the power of randomized experiments without statistical bias?

Author

Abstract

Suggested Citation

Download full text from publisher

References listed on IDEAS

Most related items

More about this item

NEP fields

Statistics

Corrections

More services and features

MyIDEAS

Author registration

Rankings

RePEc Genealogy

RePEc Biblio

MPRA

New papers by email

EconAcademics

Plagiarism

About RePEc

RePEc home

Blog

Help/FAQ

RePEc team

Participating archives

Privacy statement

Help us

Corrections

Volunteers

Get papers listed

Open a RePEc archive

Get RePEc data