Estimation Considerations in Contextual Bandits

My bibliography Save this paper

Estimation Considerations in Contextual Bandits

Author

Listed:

Maria Dimakopoulou
Zhengyuan Zhou
Susan Athey
Guido Imbens

Registered:

Abstract

Contextual bandit algorithms are sensitive to the estimation method of the outcome model as well as the exploration method used, particularly in the presence of rich heterogeneity or complex outcome models, which can lead to difficult estimation problems along the path of learning. We study a consideration for the exploration vs. exploitation framework that does not arise in multi-armed bandits but is crucial in contextual bandits; the way exploration and exploitation is conducted in the present affects the bias and variance in the potential outcome model estimation in subsequent stages of learning. We develop parametric and non-parametric contextual bandits that integrate balancing methods from the causal inference literature in their estimation to make it less prone to problems of estimation bias. We provide the first regret bound analyses for contextual bandits with balancing in the domain of linear contextual bandits that match the state of the art regret bounds. We demonstrate the strong practical advantage of balanced contextual bandits on a large number of supervised learning datasets and on a synthetic example that simulates model mis-specification and prejudice in the initial training data. Additionally, we develop contextual bandits with simpler assignment policies by leveraging sparse model estimation methods from the econometrics literature and demonstrate empirically that in the early stages they can improve the rate of learning and decrease regret.

Suggested Citation

Maria Dimakopoulou & Zhengyuan Zhou & Susan Athey & Guido Imbens, 2017. "Estimation Considerations in Contextual Bandits," Papers 1711.07077, arXiv.org, revised Dec 2018.

Handle: RePEc:arx:papers:1711.07077

Download full text from publisher

Other versions of this item:

Dimakopoulou, Maria & Athey, Susan & Imbens, Guido W., 2018. "Estimation Considerations in Contextual Bandits," Research Papers 3644, Stanford University, Graduate School of Business.

References listed on IDEAS

Susan Athey & Julie Tibshirani & Stefan Wager, 2016. "Generalized Random Forests," Papers 1610.01271, arXiv.org, revised Apr 2018.
- Athey, Susan & Tibshirani, Julie & Wager, Stefan, 2017. "Generalized Random Forests," Research Papers 3575, Stanford University, Graduate School of Business.
Athey, Susan & Wager, Stefan, 2017. "Efficient Policy Learning," Research Papers 3506, Stanford University, Graduate School of Business.
Imbens,Guido W. & Rubin,Donald B., 2015. "Causal Inference for Statistics, Social, and Biomedical Sciences," Cambridge Books, Cambridge University Press, number 9780521885881, May.

Full references (including those not matched with items on IDEAS)

Citations

Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.

Cited by:

Caio Waisman & Harikesh S. Nair & Carlos Carrion, 2025. "Online Causal Inference for Advertising in Real-Time Bidding Auctions," Marketing Science, INFORMS, vol. 44(1), pages 176-195, January.
- Caio Waisman & Harikesh S. Nair & Carlos Carrion, 2019. "Online Causal Inference for Advertising in Real-Time Bidding Auctions," Papers 1908.08600, arXiv.org, revised Feb 2024.
Cavanagh,Jack & Fliegner,Jasmin Claire & Kopper,Sarah & Sautmann,Anja, 2023. "A Metadata Schema for Data from Experiments in the Social Sciences," Policy Research Working Paper Series 10296, The World Bank.
Yusuke Narita & Shota Yasui & Kohei Yata, 2020. "Debiased Off-Policy Evaluation for Recommendation Systems," Papers 2002.08536, arXiv.org, revised Aug 2021.

Most related items

These are the items that most often cite the same works as this one and are cited by the same works as this one.

Rina Friedberg & Julie Tibshirani & Susan Athey & Stefan Wager, 2018. "Local Linear Forests," Papers 1807.11408, arXiv.org, revised Sep 2020.
Shinde, Nilesh N. & Do Valle, Stella Z. Schons & Maia, Alexandre Gori & Amacher, Gregory S., 2022. "Can an environmental policy contribute to the reduction of land conflict? Evidence from the Rural Environmental Registry (CAR) in the Brazilian Amazon," 2022 Annual Meeting, July 31-August 2, Anaheim, California 322584, Agricultural and Applied Economics Association.
Zhengyuan Zhou & Susan Athey & Stefan Wager, 2023. "Offline Multi-Action Policy Learning: Generalization and Optimization," Operations Research, INFORMS, vol. 71(1), pages 148-183, January.
- Zhou, Zhengyuan & Athey, Susan & Wager, Stefan, 2018. "Offline Multi-Action Policy Learning: Generalization and Optimization," Research Papers 3734, Stanford University, Graduate School of Business.
- Zhengyuan Zhou & Susan Athey & Stefan Wager, 2018. "Offline Multi-Action Policy Learning: Generalization and Optimization," Papers 1810.04778, arXiv.org, revised Nov 2018.
Susan Athey & Raj Chetty & Guido Imbens, 2020. "Using Experiments to Correct for Selection in Observational Studies," Papers 2006.09676, arXiv.org, revised May 2025.
Valente, Marica, 2023. "Policy evaluation of waste pricing programs using heterogeneous causal effect estimation," Journal of Environmental Economics and Management, Elsevier, vol. 117(C).
- Marica Valente, 2020. "Policy evaluation of waste pricing programs using heterogeneous causal effect estimation," Papers 2010.01105, arXiv.org, revised Nov 2022.
- Marica Valente, 2021. "Policy Evaluation of Waste Pricing Programs Using Heterogeneous Causal Effect Estimation," Discussion Papers of DIW Berlin 1980, DIW Berlin, German Institute for Economic Research.
Miruna Oprescu & Vasilis Syrgkanis & Zhiwei Steven Wu, 2018. "Orthogonal Random Forest for Causal Inference," Papers 1806.03467, arXiv.org, revised Sep 2019.
Michael C Knaus, 2022. "Double machine learning-based programme evaluation under unconfoundedness [Econometric methods for program evaluation]," The Econometrics Journal, Royal Economic Society, vol. 25(3), pages 602-627.
- Knaus, Michael C., 2020. "Double Machine Learning Based Program Evaluation under Unconfoundedness," IZA Discussion Papers 13051, Institute of Labor Economics (IZA).
- Michael C. Knaus, 2020. "Double Machine Learning based Program Evaluation under Unconfoundedness," Papers 2003.03191, arXiv.org, revised Jun 2022.
- Knaus, Michael C., 2020. "Double Machine Learning based Program Evaluation under Unconfoundedness," Economics Working Paper Series 2004, University of St. Gallen, School of Economics and Political Science.
Newham, Melissa & Valente, Marica, 2024. "The cost of influence: How gifts to physicians shape prescriptions and drug costs," Journal of Health Economics, Elsevier, vol. 95(C).
- Melissa Newham & Marica Valente, 2022. "The Cost of Influence: How Gifts to Physicians Shape Prescriptions and Drug Costs," Papers 2203.01778, arXiv.org, revised Apr 2023.
- Melissa Newham & Marica Valente, 2023. "The Cost of Influence:How Gifts to Physicians Shape Prescriptions and Drug Costs," Working Papers 2023-03, Faculty of Economics and Statistics, Universität Innsbruck.
Davide Viviano, 2019. "Policy Targeting under Network Interference," Papers 1906.10258, arXiv.org, revised Apr 2024.
Masahiro Kato & Masaaki Imaizumi & Takuya Ishihara & Toru Kitagawa, 2022. "Best Arm Identification with Contextual Information under a Small Gap," Papers 2209.07330, arXiv.org, revised Jan 2023.
Maria Dimakopoulou & Zhimei Ren & Zhengyuan Zhou, 2021. "Online Multi-Armed Bandits with Adaptive Inference," Papers 2102.13202, arXiv.org, revised Jun 2021.
Qingyuan Zhao & Dylan S. Small & Ashkan Ertefaie, 2022. "Selective inference for effect modification via the lasso," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 84(2), pages 382-413, April.
Vishal Gupta & Brian Rongqing Han & Song-Hee Kim & Hyung Paek, 2020. "Maximizing Intervention Effectiveness," Management Science, INFORMS, vol. 66(12), pages 5576-5598, December.
Nathan Kallus, 2022. "Treatment Effect Risk: Bounds and Inference," Papers 2201.05893, arXiv.org, revised Jul 2022.
Mert Demirer & Vasilis Syrgkanis & Greg Lewis & Victor Chernozhukov, 2019. "Semi-Parametric Efficient Policy Learning with Continuous Actions," CeMMAP working papers CWP34/19, Centre for Microdata Methods and Practice, Institute for Fiscal Studies.
- Mert Demirer & Vasilis Syrgkanis & Greg Lewis & Victor Chernozhukov, 2019. "Semi-Parametric Efficient Policy Learning with Continuous Actions," Papers 1905.10116, arXiv.org, revised Jul 2019.
Hema Yoganarasimhan & Ebrahim Barzegary & Abhishek Pani, 2020. "Design and Evaluation of Personalized Free Trials," Papers 2006.13420, arXiv.org.
Davide Viviano & Jelena Bradic, 2020. "Fair Policy Targeting," Papers 2005.12395, arXiv.org, revised Jun 2022.
Gabriel Okasa, 2022. "Meta-Learners for Estimation of Causal Effects: Finite Sample Cross-Fit Performance," Papers 2201.12692, arXiv.org.
Bo, Hao & Galiani, Sebastian, 2021. "Assessing external validity," Research in Economics, Elsevier, vol. 75(3), pages 274-285.
- Hao Bo & Sebastian Galiani, 2019. "Assessing External Validity," NBER Working Papers 26422, National Bureau of Economic Research, Inc.
Sven Resnjanskij & Jens Ruhose & Simon Wiederhold & Ludger Wößmann, 2021. "Mentoring verbessert die Arbeitsmarktchancen von stark benachteiligten Jugendlichen," ifo Schnelldienst, ifo Institute - Leibniz Institute for Economic Research at the University of Munich, vol. 74(02), pages 31-38, February.

More about this item

NEP fields

This paper has been announced in the following NEP Reports:

NEP-ECM-2018-01-15 (Econometrics)

Statistics

Access and download statistics

Corrections

All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:arx:papers:1711.07077. See general information about how to correct material in RePEc.

If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: arXiv administrators (email available below). General contact details of provider: http://arxiv.org/ .

Please note that corrections may take a couple of weeks to filter through the various RePEc services.

IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.

Browse Econ Literature

More features

Estimation Considerations in Contextual Bandits

Author

Abstract

Suggested Citation

Download full text from publisher

Other versions of this item:

References listed on IDEAS

Citations

Most related items

More about this item

NEP fields

Statistics

Corrections

More services and features

MyIDEAS

Author registration

Rankings

RePEc Genealogy

RePEc Biblio

MPRA

New papers by email

EconAcademics

Plagiarism

About RePEc

RePEc home

Blog

Help/FAQ

RePEc team

Participating archives

Privacy statement

Help us

Corrections

Volunteers

Get papers listed

Open a RePEc archive

Get RePEc data