IDEAS home Printed from https://ideas.repec.org/p/arx/papers/2602.15730.html

Causal Effect Estimation with Latent Textual Treatments

Author

Listed:
  • Omri Feldman
  • Amar Venugopal
  • Jann Spiess
  • Amir Feder

Abstract

Understanding the causal effects of text on downstream outcomes is a central task in many applications. Estimating such effects requires researchers to run controlled experiments that systematically vary textual features. While large language models (LLMs) hold promise for generating text, producing and evaluating controlled variation requires more careful attention. In this paper, we present an end-to-end pipeline for the generation and causal estimation of latent textual interventions. Our work first performs hypothesis generation and steering via sparse autoencoders (SAEs), followed by robust causal estimation. Our pipeline addresses both computational and statistical challenges in text-as-treatment experiments. We demonstrate that naive estimation of causal effects suffers from significant bias as text inherently conflates treatment and covariate information. We describe the estimation bias induced in this setting and propose a solution based on covariate residualization. Our empirical results show that our pipeline effectively induces variation in target features and mitigates estimation error, providing a robust foundation for causal effect estimation in text-as-treatment settings.

Suggested Citation

  • Omri Feldman & Amar Venugopal & Jann Spiess & Amir Feder, 2026. "Causal Effect Estimation with Latent Textual Treatments," Papers 2602.15730, arXiv.org.
  • Handle: RePEc:arx:papers:2602.15730
    as

    Download full text from publisher

    File URL: http://arxiv.org/pdf/2602.15730
    File Function: Latest version
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. X Nie & S Wager, 2021. "Quasi-oracle estimation of heterogeneous treatment effects [TensorFlow: A system for large-scale machine learning]," Biometrika, Biometrika Trust, vol. 108(2), pages 299-319.
    2. Iman Modarressi & Jann Spiess & Amar Venugopal, 2025. "Causal Inference on Outcomes Learned from Text," Papers 2503.00725, arXiv.org.
    3. Paul B. Ellickson & Wreetabrata Kar & James C. Reeder, 2023. "Estimating Marketing Component Effects: Double Machine Learning from Targeted Digital Promotions," Marketing Science, INFORMS, vol. 42(4), pages 704-728, July.
    4. Mozer, Reagan & Miratrix, Luke & Kaufman, Aaron Russell & Jason Anastasopoulos, L., 2020. "Matching with Text Data: An Experimental Evaluation of Methods for Matching Documents and of Measuring Match Quality," Political Analysis, Cambridge University Press, vol. 28(4), pages 445-468, October.
    5. ., 2023. "Conclusion: main findings and implications," Chapters, in: Beyond ‘Ever Closer Union’, chapter 5, pages 147-164, Edward Elgar Publishing.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Sallin, Aurelién, 2021. "Estimating returns to special education: combining machine learning and text analysis to address confounding," Economics Working Paper Series 2109, University of St. Gallen, School of Economics and Political Science.
    2. Bokelmann, Björn & Lessmann, Stefan, 2024. "Improving uplift model evaluation on randomized controlled trial data," European Journal of Operational Research, Elsevier, vol. 313(2), pages 691-707.
    3. Undral Byambadalai & Tatsushi Oka & Shota Yasui, 2024. "Estimating Distributional Treatment Effects in Randomized Experiments: Machine Learning for Variance Reduction," Papers 2407.16037, arXiv.org.
    4. Daniel Goller, 2023. "Analysing a built-in advantage in asymmetric darts contests using causal machine learning," Annals of Operations Research, Springer, vol. 325(1), pages 649-679, June.
    5. Wu, Guojun & Song, Ge & Lv, Xiaoxiang & Luo, Shikai & Shi, Chengchun & Zhu, Hongtu, 2023. "DNet: distributional network for distributional individualized treatment effects," LSE Research Online Documents on Economics 122895, London School of Economics and Political Science, LSE Library.
    6. Ta-Wei Huang & Eva Ascarza, 2024. "Doing More with Less: Overcoming Ineffective Long-Term Targeting Using Short-Term Signals," Marketing Science, INFORMS, vol. 43(4), pages 863-884, July.
    7. Zhu, Chenliang & Qi, Jiajun & Feng, Lingbing & Wang, Xinyi, 2025. "Source control or end-of-pipe treatment: How green finance policy impacts enterprise carbon intensity," International Review of Financial Analysis, Elsevier, vol. 104(PA).
    8. Yiyi Huo & Yingying Fan & Fang Han, 2023. "On the adaptation of causal forests to manifold data," Papers 2311.16486, arXiv.org, revised Dec 2023.
    9. Miquel Oliu-Barton & Bary S. R. Pradelski & Nicolas Woloszko & Lionel Guetta-Jeanrenaud & Philippe Aghion & Patrick Artus & Arnaud Fontanet & Philippe Martin & Guntram B. Wolff, 2022. "The effect of COVID certificates on vaccine uptake, health outcomes, and the economy," Nature Communications, Nature, vol. 13(1), pages 1-13, December.
    10. Adel Daoud, 2025. "Beyond Interaction Effects: Two Logics for Studying Population Inequalities," Papers 2601.04223, arXiv.org.
    11. Bolbocean, Corneliu & Anderson, Peter J. & Bartmann, Peter & Cheong, Jeanie L.Y. & Doyle, Lex W. & Johnson, Samantha & Marlow, Neil & Wolke, Dieter & Petrou, Stavros & O'Neill, Stephen, 2025. "A heterogeneity analysis of health-related quality of life in early adults born very preterm or very low birthweight across the sociodemographic spectrum," Social Science & Medicine, Elsevier, vol. 380(C).
    12. Rahul Singh & Hannah Zhou, 2022. "Kernel methods for long term dose response curves," Papers 2201.05139, arXiv.org, revised Dec 2024.
    13. David M. Ritzwoller & Vasilis Syrgkanis, 2024. "Simultaneous Inference for Local Structural Parameters with Random Forests," Papers 2405.07860, arXiv.org, revised Sep 2024.
    14. Tomoshige Nakamura & Mihoko Minami, 2021. "Causal Subclassification Tree Algorithm and Robust Causal Effect Estimation via Subclassification," International Journal of Statistics and Probability, Canadian Center of Science and Education, vol. 10(1), pages 1-40, January.
    15. Phillip Heiler & Michael C. Knaus, 2025. "Heterogeneity Analysis with Heterogeneous Treatments," Papers 2507.01517, arXiv.org, revised Feb 2026.
    16. Jun Mao & Jiahao Xie & Zunguo Hu & Lijie Deng & Haitao Wu & Yu Hao, 2023. "Sustainable development through green innovation and resource allocation in cities: Evidence from machine learning," Sustainable Development, John Wiley & Sons, Ltd., vol. 31(4), pages 2386-2401, August.
    17. Hua Chen & Jianing Xing & Xiaoxu Yang & Kai Zhan, 2021. "Heterogeneous Effects of Health Insurance on Rural Children’s Health in China: A Causal Machine Learning Approach," IJERPH, MDPI, vol. 18(18), pages 1-14, September.
    18. Phillip Heiler & Michael C. Knaus, 2021. "Effect or Treatment Heterogeneity? Policy Evaluation with Aggregated and Disaggregated Treatments," Papers 2110.01427, arXiv.org, revised Aug 2023.
    19. Artem Timoshenko & Caio Waisman, 2025. "Profit-Aligned CATE Estimation: Reconciling Policy Learning and Inference," Papers 2512.13400, arXiv.org, revised Apr 2026.
    20. Nathan Kallus, 2023. "Treatment Effect Risk: Bounds and Inference," Management Science, INFORMS, vol. 69(8), pages 4579-4590, August.

    More about this item

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:arx:papers:2602.15730. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: arXiv administrators (email available below). General contact details of provider: http://arxiv.org/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.