Leveraging LLMs to Improve Experimental Design: A Generative Stratification Approach

My bibliography Save this paper

Leveraging LLMs to Improve Experimental Design: A Generative Stratification Approach

Author

Listed:

George Gui
Seungwoo Kim

Registered:

Abstract

Pre-experiment stratification, or blocking, is a well-established technique for designing more efficient experiments and increasing the precision of the experimental estimates. However, when researchers have access to many covariates at the experiment design stage, they often face challenges in effectively selecting or weighting covariates when creating their strata. This paper proposes a Generative Stratification procedure that leverages Large Language Models (LLMs) to synthesize high-dimensional covariate data to improve experimental design. We demonstrate the value of this approach by applying it to a set of experiments and find that our method would have reduced the variance of the treatment effect estimate by 10%-50% compared to simple randomization in our empirical applications. When combined with other standard stratification methods, it can be used to further improve the efficiency. Our results demonstrate that LLM-based simulation is a practical and easy-to-implement way to improve experimental design in covariate-rich settings.

Suggested Citation

George Gui & Seungwoo Kim, 2025. "Leveraging LLMs to Improve Experimental Design: A Generative Stratification Approach," Papers 2509.25709, arXiv.org.

Handle: RePEc:arx:papers:2509.25709

Download full text from publisher

References listed on IDEAS

Felipe Barrera-Osorio & Leigh L. Linden & Juan E. Saavedra, 2019. "Medium- and Long-Term Educational Consequences of Alternative Conditional Cash Transfer Designs: Experimental Evidence from Colombia," American Economic Journal: Applied Economics, American Economic Association, vol. 11(3), pages 54-91, July.
- Felipe Barrera-Osorio & Leigh L. Linden & Juan Saavedra, 2017. "Medium- and Long-Term Educational Consequences of Alternative Conditional Cash Transfer Designs: Experimental Evidence from Colombia," NBER Working Papers 23275, National Bureau of Economic Research, Inc.
Suresh de Mel & David McKenzie & Christopher Woodruff, 2019. "Labor Drops: Experimental Evidence on the Return to Additional Labor in Microenterprises," American Economic Journal: Applied Economics, American Economic Association, vol. 11(1), pages 202-235, January.
- De Mel,Suresh & Mckenzie,David J. & Woodruff,Christopher M. & De Mel,Suresh & Mckenzie,David J. & Woodruff,Christopher M., 2016. "Labor drops : experimental evidence on the return to additional labor in microenterprises," Policy Research Working Paper Series 7924, The World Bank.
- Suresh De Mel & David McKenzie & Christopher Woodruff, 2016. "Labor Drops: Experimental Evidence on the Return to Additional Labor in Microenterprises," NBER Working Papers 23005, National Bureau of Economic Research, Inc.
Duflo, Esther & Glennerster, Rachel & Kremer, Michael, 2008. "Using Randomization in Development Economics Research: A Toolkit," Handbook of Development Economics, in: T. Paul Schultz & John A. Strauss (ed.), Handbook of Development Economics, edition 1, volume 4, chapter 61, pages 3895-3962, Elsevier.
- Esther Duflo & Rachel Glennerster & Michael Kremer, 2006. "Using Randomization in Development Economics Research: A Toolkit," NBER Technical Working Papers 0333, National Bureau of Economic Research, Inc.
- Kremer, Michael & Glennerster, Rachel & Duflo, Esther, 2007. "Using Randomization in Development Economics Research: A Toolkit," CEPR Discussion Papers 6059, C.E.P.R. Discussion Papers.
- Esther Duflo & Rachel Glennerster & Michael Kremer, 2006. "Using Randomization in Development Economics Research: A Toolkit," CID Working Papers 138, Center for International Development at Harvard University.
Yuehao Bai, 2022. "Optimality of Matched-Pair Designs in Randomized Controlled Trials," Papers 2206.07845, arXiv.org.
Ali Goli & Amandeep Singh, 2024. "Frontiers: Can Large Language Models Capture Human Preferences?," Marketing Science, INFORMS, vol. 43(4), pages 709-722, July.
Alan Gerber & Mitchell Hoffman & John Morgan & Collin Raymond, 2020. "One in a Million: Field Experiments on Perceived Closeness of the Election and Voter Turnout," American Economic Journal: Applied Economics, American Economic Association, vol. 12(3), pages 287-325, July.
- Alan Gerber & Mitchell Hoffman & John Morgan & Collin Raymond, 2017. "One in a Million: Field Experiments on Perceived Closeness of the Election and Voter Turnout," NBER Working Papers 23071, National Bureau of Economic Research, Inc.
Argyle, Lisa P. & Busby, Ethan C. & Fulda, Nancy & Gubler, Joshua R. & Rytting, Christopher & Wingate, David, 2023. "Out of One, Many: Using Language Models to Simulate Human Samples," Political Analysis, Cambridge University Press, vol. 31(3), pages 337-351, July.
Olivier Toubia & George Z. Gui & Tianyi Peng & Daniel J. Merlau & Ang Li & Haozhe Chen, 2025. "Twin-2K-500: A dataset for building digital twins of over 2,000 people based on their answers to over 500 questions," Papers 2505.17479, arXiv.org.
Yuehao Bai, 2022. "Optimality of Matched-Pair Designs in Randomized Controlled Trials," American Economic Review, American Economic Association, vol. 112(12), pages 3911-3940, December.
John J. Horton, 2023. "Large Language Models as Simulated Economic Agents: What Can We Learn from Homo Silicus?," NBER Working Papers 31122, National Bureau of Economic Research, Inc.
John J. Horton, 2023. "Large Language Models as Simulated Economic Agents: What Can We Learn from Homo Silicus?," Papers 2301.07543, arXiv.org.
Martin Abel & Rulof Burger & Patrizio Piraino, 2020. "The Value of Reference Letters: Experimental Evidence from South Africa," American Economic Journal: Applied Economics, American Economic Association, vol. 12(3), pages 40-71, July.
Fabio Motoki & Valdemar Pinho Neto & Victor Rodrigues, 2024. "More human than human: measuring ChatGPT political bias," Public Choice, Springer, vol. 198(1), pages 3-23, January.
Zikun Ye & Hema Yoganarasimhan & Yufeng Zheng, 2025. "LOLA: LLM-Assisted Online Learning Algorithm for Content Experiments," Marketing Science, INFORMS, vol. 44(5), pages 995-1016, September.
Marcel Binz & Elif Akata & Matthias Bethge & Franziska Brändle & Fred Callaway & Julian Coda-Forno & Peter Dayan & Can Demircan & Maria K. Eckstein & Noémi Éltető & Thomas L. Griffiths & Susanne Harid, 2025. "A foundation model to predict and capture human cognition," Nature, Nature, vol. 644(8078), pages 1002-1009, August.
Randall A. Lewis & Justin M. Rao, 2015. "The Unfavorable Economics of Measuring the Returns to Advertising," The Quarterly Journal of Economics, President and Fellows of Harvard College, vol. 130(4), pages 1941-1973.

Full references (including those not matched with items on IDEAS)

Most related items

These are the items that most often cite the same works as this one and are cited by the same works as this one.

Matthew O. Jackson & Qiaozhu Me & Stephanie W. Wang & Yutong Xie & Walter Yuan & Seth Benzell & Erik Brynjolfsson & Colin F. Camerer & James Evans & Brian Jabarian & Jon Kleinberg & Juanjuan Meng & Se, 2025. "AI Behavioral Science," Papers 2509.13323, arXiv.org.
Yuehao Bai, 2022. "Optimality of Matched-Pair Designs in Randomized Controlled Trials," Papers 2206.07845, arXiv.org.
Yingnan Yan & Tianming Liu & Yafeng Yin, 2025. "Valuing Time in Silicon: Can Large Language Model Replicate Human Value of Travel Time," Papers 2507.22244, arXiv.org.
Shu Wang & Zijun Yao & Shuhuai Zhang & Jianuo Gai & Tracy Xiao Liu & Songfa Zhong, 2025. "When Experimental Economics Meets Large Language Models: Evidence-based Tactics," Papers 2505.21371, arXiv.org, revised Jul 2025.
Hui Chen & Antoine Didisheim & Luciano Somoza & Hanqing Tian, 2025. "A Financial Brain Scan of the LLM," Papers 2508.21285, arXiv.org.
repec:osf:osfxxx:r3qng_v1 is not listed on IDEAS
Yuan Gao & Dokyun Lee & Gordon Burtch & Sina Fazelpour, 2024. "Take Caution in Using LLMs as Human Surrogates: Scylla Ex Machina," Papers 2410.19599, arXiv.org, revised Jan 2025.
Aliya Amirova & Theodora Fteropoulli & Nafiso Ahmed & Martin R Cowie & Joel Z Leibo, 2024. "Framework-based qualitative analysis of free responses of Large Language Models: Algorithmic fidelity," PLOS ONE, Public Library of Science, vol. 19(3), pages 1-33, March.
Bugni, Federico A. & Gao, Mengsi, 2023. "Inference under covariate-adaptive randomization with imperfect compliance," Journal of Econometrics, Elsevier, vol. 237(1).
Sugat Chaturvedi & Rochana Chaturvedi, 2025. "Who Gets the Callback? Generative AI and Gender Bias," Papers 2504.21400, arXiv.org.
Anne Lundgaard Hansen & Seung Jung Lee, 2025. "Financial Stability Implications of Generative AI: Taming the Animal Spirits," Papers 2510.01451, arXiv.org.
Hua Li & Qifang Wang & Ye Wu, 2025. "From Mobile Media to Generative AI: The Evolutionary Logic of Computational Social Science Across Data, Methods, and Theory," Mathematics, MDPI, vol. 13(19), pages 1-17, September.
Ben Weidmann & Yixian Xu & David J. Deming, 2025. "Measuring Human Leadership Skills with Artificially Intelligent Agents," Papers 2508.02966, arXiv.org.
Jinglong Zhao, 2023. "Adaptive Neyman Allocation," Papers 2309.08808, arXiv.org, revised Sep 2025.
Yuehao Bai & Jizhou Liu & Azeem M. Shaikh & Max Tabord-Meehan, 2023. "On the Efficiency of Finely Stratified Experiments," Papers 2307.15181, arXiv.org, revised Nov 2025.
Navid Ghaffarzadegan & Aritra Majumdar & Ross Williams & Niyousha Hosseinichimeh, 2024. "Generative agent‐based modeling: an introduction and tutorial," System Dynamics Review, System Dynamics Society, vol. 40(1), January.
Seung Jung Lee & Anne Lundgaard Hansen, 2025. "Financial Stability Implications of Generative AI: Taming the Animal Spirits," Finance and Economics Discussion Series 2025-090, Board of Governors of the Federal Reserve System (U.S.).
Yuehao Bai & Meng Hsuan Hsieh & Jizhou Liu & Max Tabord-Meehan, 2022. "Revisiting the Analysis of Matched-Pair and Stratified Experiments in the Presence of Attrition," Papers 2209.11840, arXiv.org, revised Oct 2023.
Yuehao Bai, 2022. "Optimality of Matched-Pair Designs in Randomized Controlled Trials," American Economic Review, American Economic Association, vol. 112(12), pages 3911-3940, December.
Paola Cillo & Gaia Rubera, 2025. "Generative AI in innovation and marketing processes: A roadmap of research opportunities," Journal of the Academy of Marketing Science, Springer, vol. 53(3), pages 684-701, May.
Yuehao Bai & Meng Hsuan Hsieh & Jizhou Liu & Max Tabord‐Meehan, 2024. "Revisiting the analysis of matched‐pair and stratified experiments in the presence of attrition," Journal of Applied Econometrics, John Wiley & Sons, Ltd., vol. 39(2), pages 256-268, March.

More about this item

NEP fields

This paper has been announced in the following NEP Reports:

NEP-AIN-2025-10-20 (Artificial Intelligence)
NEP-BIG-2025-10-20 (Big Data)
NEP-DCM-2025-10-20 (Discrete Choice Models)
NEP-EXP-2025-10-20 (Experimental Economics)

Statistics

Access and download statistics

Corrections

All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:arx:papers:2509.25709. See general information about how to correct material in RePEc.

If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: arXiv administrators (email available below). General contact details of provider: http://arxiv.org/ .

Please note that corrections may take a couple of weeks to filter through the various RePEc services.

IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.

Browse Econ Literature

More features

Leveraging LLMs to Improve Experimental Design: A Generative Stratification Approach

Author

Abstract

Suggested Citation

Download full text from publisher

References listed on IDEAS

Most related items

More about this item

NEP fields

Statistics

Corrections

More services and features

MyIDEAS

Author registration

Rankings

RePEc Genealogy

RePEc Biblio

MPRA

New papers by email

EconAcademics

Plagiarism

About RePEc

RePEc home

Blog

Help/FAQ

RePEc team

Participating archives

Privacy statement

Help us

Corrections

Volunteers

Get papers listed

Open a RePEc archive

Get RePEc data