Off-Policy Evaluation via Adaptive Weighting with Data from Contextual Bandits

Off-Policy Evaluation via Adaptive Weighting with Data from Contextual Bandits

Author

Listed:

Ruohan Zhan
Vitor Hadad
David A. Hirshberg
Susan Athey

Registered:

Susan Carleton Athey

Abstract

It has become increasingly common for data to be collected adaptively, for example using contextual bandits. Historical data of this type can be used to evaluate other treatment assignment policies to guide future innovation or experiments. However, policy evaluation is challenging if the target policy differs from the one used to collect data, and popular estimators, including doubly robust (DR) estimators, can be plagued by bias, excessive variance, or both. In particular, when the pattern of treatment assignment in the collected data looks little like the pattern generated by the policy to be evaluated, the importance weights used in DR estimators explode, leading to excessive variance. In this paper, we improve the DR estimator by adaptively weighting observations to control its variance. We show that a t-statistic based on our improved estimator is asymptotically normal under certain conditions, allowing us to form confidence intervals and test hypotheses. Using synthetic data and public benchmarks, we provide empirical evidence for our estimator's improved accuracy and inferential properties relative to existing alternatives.

Suggested Citation

Ruohan Zhan & Vitor Hadad & David A. Hirshberg & Susan Athey, 2021. "Off-Policy Evaluation via Adaptive Weighting with Data from Contextual Bandits," Papers 2106.02029, arXiv.org, revised Jun 2021.

Handle: RePEc:arx:papers:2106.02029

Download full text from publisher

Other versions of this item:

Zhan, Ruohan & Hadad, Vitor & Hirshberg, David A. & Athey, Susan, 2021. "Off-Policy Evaluation via Adaptive Weighting with Data from Contextual Bandits," Research Papers 3970, Stanford University, Graduate School of Business.

References listed on IDEAS

Guido W. Imbens, 2004. "Nonparametric Estimation of Average Treatment Effects Under Exogeneity: A Review," The Review of Economics and Statistics, MIT Press, vol. 86(1), pages 4-29, February.
- Guido W. Imbens, 2003. "Nonparametric Estimation of Average Treatment Effects under Exogeneity: A Review," NBER Technical Working Papers 0294, National Bureau of Economic Research, Inc.
Imbens,Guido W. & Rubin,Donald B., 2015. "Causal Inference for Statistics, Social, and Biomedical Sciences," Cambridge Books, Cambridge University Press, number 9780521885881.
S. A. Murphy, 2003. "Optimal dynamic treatment regimes," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 65(2), pages 331-355, May.

Full references (including those not matched with items on IDEAS)

Citations

Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.

Cited by:

Kemper, Jan & Rostam-Afschar, Davud, 2026. "Earning While Learning: How to Run Batched Bandit Experiments," IZA Discussion Papers 18429, IZA Network @ LISER.
- Kemper, Jan & Rostam-Afschar, Davud, 2026. "Earning While Learning: How to Run Batched Bandit Experiments," GLO Discussion Paper Series 1717, Global Labor Organization (GLO).
Brian Cho & Ana-Roxana Pop & Ariel Evnine & Nathan Kallus, 2025. "SNPL: Simultaneous Policy Learning and Evaluation for Safe Multi-Objective Policy Improvement," Papers 2503.12760, arXiv.org, revised Mar 2025.
Wang, Weiwei & Shapovalova, Yuliya & Li, Yuqiang & Wu, Xianyi, 2025. "Divide-and-conquer offline policy evaluation for contextual bandits," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 676(C).
Shuze Chen & David Simchi-Levi & Chonghuan Wang, 2024. "Improving the Estimation of Lifetime Effects in A/B Testing via Treatment Locality," Papers 2407.19618, arXiv.org, revised Sep 2025.
Gabriel Saco, 2026. "Fixed-Horizon Self-Normalized Inference for Adaptive Experiments via Martingale AIPW/DML with Logged Propensities," Papers 2602.15559, arXiv.org.
Jonas Metzger, 2022. "Adversarial Estimators," Papers 2204.10495, arXiv.org, revised Jun 2022.
Aur'elien Bibaut & Nathan Kallus, 2024. "Demistifying Inference after Adaptive Experiments," Papers 2405.01281, arXiv.org.
Jinglong Zhao, 2024. "Experimental Design For Causal Inference Through An Optimization Lens," Papers 2408.09607, arXiv.org, revised Aug 2024.
Vasilis Syrgkanis & Ruohan Zhan, 2023. "Post Reinforcement Learning Inference," Papers 2302.08854, arXiv.org, revised Oct 2025.
Ruohan Zhan & Zhimei Ren & Susan Athey & Zhengyuan Zhou, 2024. "Policy Learning with Adaptively Collected Data," Management Science, INFORMS, vol. 70(8), pages 5270-5297, August.
- Zhan, Ruohan & Ren, Zhimei & Athey, Susan & Zhou, Zhengyuan, 2021. "Policy Learning with Adaptively Collected Data," Research Papers 3963, Stanford University, Graduate School of Business.
- Ruohan Zhan & Zhimei Ren & Susan Athey & Zhengyuan Zhou, 2021. "Policy Learning with Adaptively Collected Data," Papers 2105.02344, arXiv.org, revised Nov 2022.

Most related items

These are the items that most often cite the same works as this one and are cited by the same works as this one.

Rahul Singh & Liyuan Xu & Arthur Gretton, 2020. "Kernel Methods for Causal Functions: Dose, Heterogeneous, and Incremental Response Curves," Papers 2010.04855, arXiv.org, revised Oct 2022.
Davide Viviano & Jelena Bradic, 2020. "Fair Policy Targeting," Papers 2005.12395, arXiv.org, revised Jun 2022.
Caloffi, Annalisa & Freo, Marzia & Ghinoi, Stefano & Mariani, Marco & Rossi, Federica, 2022. "Assessing the effects of a deliberate policy mix: The case of technology and innovation advisory services and innovation vouchers," Research Policy, Elsevier, vol. 51(6).
Shanike J. Smart & Solomon W. Polachek, 2024. "COVID-19 vaccine and risk-taking," Journal of Risk and Uncertainty, Springer, vol. 68(1), pages 25-49, February.
- Smart, Shanike J. & Polachek, Solomon, 2024. "COVID-19 Vaccine and Risk-Taking," IZA Discussion Papers 16707, IZA Network @ LISER.
Vincent Starck, 2025. "Improving control over unobservables with network data," Papers 2511.00612, arXiv.org.
Florian Gunsilius & Yuliang Xu, 2021. "Matching for causal effects via multimarginal unbalanced optimal transport," Papers 2112.04398, arXiv.org, revised Jul 2022.
Zhengyuan Zhou & Susan Athey & Stefan Wager, 2023. "Offline Multi-Action Policy Learning: Generalization and Optimization," Operations Research, INFORMS, vol. 71(1), pages 148-183, January.
- Zhou, Zhengyuan & Athey, Susan & Wager, Stefan, 2018. "Offline Multi-Action Policy Learning: Generalization and Optimization," Research Papers 3734, Stanford University, Graduate School of Business.
- Zhengyuan Zhou & Susan Athey & Stefan Wager, 2018. "Offline Multi-Action Policy Learning: Generalization and Optimization," Papers 1810.04778, arXiv.org, revised Nov 2018.
Ruohan Zhan & Zhimei Ren & Susan Athey & Zhengyuan Zhou, 2024. "Policy Learning with Adaptively Collected Data," Management Science, INFORMS, vol. 70(8), pages 5270-5297, August.
- Ruohan Zhan & Zhimei Ren & Susan Athey & Zhengyuan Zhou, 2021. "Policy Learning with Adaptively Collected Data," Papers 2105.02344, arXiv.org, revised Nov 2022.
- Zhan, Ruohan & Ren, Zhimei & Athey, Susan & Zhou, Zhengyuan, 2021. "Policy Learning with Adaptively Collected Data," Research Papers 3963, Stanford University, Graduate School of Business.
Plamen Nikolov & Hongjian Wang & Kevin Acker, 2020. "Wage premium of Communist Party membership: Evidence from China," Pacific Economic Review, Wiley Blackwell, vol. 25(3), pages 309-338, August.
- Wang, Hongjian & Nikolov, Plamen & Acker, Kevin, 2019. "The Wage Premium of Communist Party Membership: Evidence from China," IZA Discussion Papers 12874, IZA Network @ LISER.
- Plamen Nikolov & Hongjian Wang & Kevin Acker, 2020. "The Wage Premium of Communist Party Membership: Evidence from China," Papers 2007.13549, arXiv.org.
Daniel Burkhard & Christian P. R. Schmid & Kaspar Wüthrich, 2019. "Financial incentives and physician prescription behavior: Evidence from dispensing regulations," Health Economics, John Wiley & Sons, Ltd., vol. 28(9), pages 1114-1129, September.
- Daniel Burkhard & Christian Schmid & Kaspar W thrich, 2015. "Financial incentives and physician prescription behavior: Evidence from dispensing regulations," Diskussionsschriften dp1511, Universitaet Bern, Departement Volkswirtschaft.
- Burkhard, D.; & Schmid, C.P.R.; & WÃ¼thrich, K.;, 2018. "Financial incentives and physician prescription behavior.Evidence from dispensing regulations," Health, Econometrics and Data Group (HEDG) Working Papers 18/17, HEDG, c/o Department of Economics, University of York.
Yusuke Narita, 2018. "Toward an Ethical Experiment," Cowles Foundation Discussion Papers 2127, Cowles Foundation for Research in Economics, Yale University.
Michael Lechner, 2023. "Causal Machine Learning and its use for public policy," Swiss Journal of Economics and Statistics, Springer;Swiss Society of Economics and Statistics, vol. 159(1), pages 1-15, December.
Özler, Berk & Çelik, Çiğdem & Cunningham, Scott & Cuevas, P. Facundo & Parisotto, Luca, 2021. "Children on the move: Progressive redistribution of humanitarian cash transfers among refugees," Journal of Development Economics, Elsevier, vol. 153(C).
- Ozler,Berk & Celik,Cigdem & Cunningham,Scott & Cuevas,Pablo Facundo & Parisotto,Luca, 2020. "Children on the Move : Progressive Redistribution of Humanitarian Cash Transfers among Refugees," Policy Research Working Paper Series 9471, The World Bank.
Guido W. Imbens, 2022. "Causality in Econometrics: Choice vs Chance," Econometrica, Econometric Society, vol. 90(6), pages 2541-2566, November.
Yusuke Narita, 2018. "Experiment-as-Market: Incorporating Welfare into Randomized Controlled Trials," Cowles Foundation Discussion Papers 2127r, Cowles Foundation for Research in Economics, Yale University, revised May 2019.
- Yusuke Narita, 2019. "Experiment-as-Market: Incorporating Welfare into Randomized Controlled Trials," Working Papers 2019-025, Human Capital and Economic Opportunity Working Group.
Graham, Bryan S. & Imbens, Guido W. & Ridder, Geert, 2025. "Measuring the effects of segregation in the presence of social spillovers: A nonparametric approach," Journal of Econometrics, Elsevier, vol. 252(PB).
Andrea Mercatanti & Fan Li, 2017. "Do debit cards decrease cash demand?: causal inference and sensitivity analysis using principal stratification," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 66(4), pages 759-776, August.
Gelter, Martin & Siems, Mathias, 2024. "Elective corporate governance: Does board choice matter?," International Review of Law and Economics, Elsevier, vol. 78(C).
Shuze Chen & David Simchi-Levi & Chonghuan Wang, 2024. "Improving the Estimation of Lifetime Effects in A/B Testing via Treatment Locality," Papers 2407.19618, arXiv.org, revised Sep 2025.
Sun, Shanxia & Delgado, Michael & Khanna, Neha, "undated". "Hybrid Vehicles and Household Driving Behavior: Implications for Miles Traveled and Gasoline Consumption," 2017 Annual Meeting, July 30-August 1, Chicago, Illinois 258502, Agricultural and Applied Economics Association.

More about this item

Statistics

Access and download statistics

Corrections

All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:arx:papers:2106.02029. See general information about how to correct material in RePEc.

If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: arXiv administrators (email available below). General contact details of provider: https://arxiv.org/ .

Please note that corrections may take a couple of weeks to filter through the various RePEc services.

IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.

Browse Econ Literature

More features

Off-Policy Evaluation via Adaptive Weighting with Data from Contextual Bandits

Author

Abstract

Suggested Citation

Download full text from publisher

Other versions of this item:

References listed on IDEAS

Citations

Most related items

More about this item

Statistics

Corrections

More services and features

MyIDEAS

Author registration

Rankings

RePEc Genealogy

RePEc Biblio

MPRA

New papers by email

EconAcademics

Plagiarism

About RePEc

RePEc home

Blog

Help/FAQ

RePEc team

Participating archives

Privacy statement

Help us

Corrections

Volunteers

Get papers listed

Open a RePEc archive

Get RePEc data