IDEAS home Printed from https://ideas.repec.org/p/cwl/cwldpp/2155.html
   My bibliography  Save this paper

Efficient Counterfactual Learning from Bandit Feedback

Author

Listed:

Abstract

What is the most statistically efficient way to do off-policy optimization with batch data from bandit feedback" For log data generated by contextual bandit algorithms, we consider offline estimators for the expected reward from a counterfactual policy. Our estimators are shown to have lowest variance in a wide class of estimators, achieving variance reduction relative to standard estimators. We then apply our estimators to improve advertisement design by a major advertisement company. Consistent with the theoretical result, our estimators allow us to improve on the existing bandit algorithm with more statistical confidence compared to a state-of-theart benchmark.

Suggested Citation

  • Yusuke Narita & Shota Yasui & Kohei Yata, 2018. "Efficient Counterfactual Learning from Bandit Feedback," Cowles Foundation Discussion Papers 2155, Cowles Foundation for Research in Economics, Yale University.
  • Handle: RePEc:cwl:cwldpp:2155
    as

    Download full text from publisher

    File URL: https://cowles.yale.edu/sites/default/files/files/pub/d21/d2155.pdf
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Cattaneo, Matias D., 2010. "Efficient semiparametric estimation of multi-valued treatment effects under ignorability," Journal of Econometrics, Elsevier, vol. 155(2), pages 138-154, April.
    2. Stefan Wager & Susan Athey, 2018. "Estimation and Inference of Heterogeneous Treatment Effects using Random Forests," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 113(523), pages 1228-1242, July.
    3. Xinkun Nie & Stefan Wager, 2017. "Quasi-Oracle Estimation of Heterogeneous Treatment Effects," Papers 1712.04912, arXiv.org, revised Aug 2020.
    4. Keisuke Hirano & Guido W. Imbens & Geert Ridder, 2003. "Efficient Estimation of Average Treatment Effects Using the Estimated Propensity Score," Econometrica, Econometric Society, vol. 71(4), pages 1161-1189, July.
    5. Jinyong Hahn, 1998. "On the Role of the Propensity Score in Efficient Semiparametric Estimation of Average Treatment Effects," Econometrica, Econometric Society, vol. 66(2), pages 315-332, March.
    6. Matias D. Cattaneo, 2010. "multi-valued treatment effects," The New Palgrave Dictionary of Economics,, Palgrave Macmillan.
    7. Newey, Whitney K., 1997. "Convergence rates and asymptotic normality for series estimators," Journal of Econometrics, Elsevier, vol. 79(1), pages 147-168, July.
    8. Daniel Ackerberg & Xiaohong Chen & Jinyong Hahn & Zhipeng Liao, 2014. "Asymptotic Efficiency of Semiparametric Two-step GMM," Review of Economic Studies, Oxford University Press, vol. 81(3), pages 919-943.
    9. Chen, Xiaohong, 2007. "Large Sample Sieve Estimation of Semi-Nonparametric Models," Handbook of Econometrics, in: J.J. Heckman & E.E. Leamer (ed.), Handbook of Econometrics, edition 1, volume 6, chapter 76, Elsevier.
    10. Newey, Whitney K, 1994. "The Asymptotic Variance of Semiparametric Estimators," Econometrica, Econometric Society, vol. 62(6), pages 1349-1382, November.
    11. Newey, Whitney K, 1990. "Semiparametric Efficiency Bounds," Journal of Applied Econometrics, John Wiley & Sons, Ltd., vol. 5(2), pages 99-135, April-Jun.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Yusuke Narita & Kohei Yata, 2021. "Algorithm is Experiment: Machine Learning, Market Design, and Policy Eligibility Rules," Working Papers 2021-022, Human Capital and Economic Opportunity Working Group.
    2. Masahiro Kato & Shota Yasui & Kenichiro McAlinn, 2020. "The Adaptive Doubly Robust Estimator for Policy Evaluation in Adaptive Experiments and a Paradox Concerning Logging Policy," Papers 2010.03792, arXiv.org, revised Jun 2021.
    3. Masahiro Kato & Masatoshi Uehara & Shota Yasui, 2020. "Off-Policy Evaluation and Learning for External Validity under a Covariate Shift," Papers 2002.11642, arXiv.org, revised Oct 2020.
    4. Masahiro Kato, 2020. "Confidence Interval for Off-Policy Evaluation from Dependent Samples via Bandit Algorithm: Approach from Standardized Martingales," Papers 2006.06982, arXiv.org.
    5. Patrick Kline & Christopher Walters, 2019. "Audits as Evidence: Experiments, Ensembles, and Enforcement," Papers 1907.06622, arXiv.org, revised Jul 2019.
    6. Masahiro Kato & Yusuke Kaneko, 2020. "Off-Policy Evaluation of Bandit Algorithm from Dependent Samples under Batch Update Policy," Papers 2010.13554, arXiv.org.
    7. Masahiro Kato, 2021. "Adaptive Doubly Robust Estimator from Non-stationary Logging Policy under a Convergence of Average Probability," Papers 2102.08975, arXiv.org, revised Mar 2021.
    8. Narita, Yusuke & Yata, Kohei, 2022. "Algorithm is Experiment: Machine Learning, Market Design, and Policy Eligibility Rules," CEI Working Paper Series 2021-05, Center for Economic Institutions, Institute of Economic Research, Hitotsubashi University.
    9. Narita, Yusuke & Yata, Kohei, 2022. "Algorithm is Experiment: Machine Learning, Market Design, and Policy Eligibility Rules," Discussion Paper Series 730, Institute of Economic Research, Hitotsubashi University.
    10. Yusuke Narita & Shota Yasui & Kohei Yata, 2020. "Debiased Off-Policy Evaluation for Recommendation Systems," Papers 2002.08536, arXiv.org, revised Aug 2021.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Zhaonan Qu & Ruoxuan Xiong & Jizhou Liu & Guido Imbens, 2021. "Efficient Treatment Effect Estimation in Observational Studies under Heterogeneous Partial Interference," Papers 2107.12420, arXiv.org, revised Jun 2022.
    2. Chunrong Ai & Oliver Linton & Kaiji Motegi & Zheng Zhang, 2021. "A unified framework for efficient estimation of general treatment models," Quantitative Economics, Econometric Society, vol. 12(3), pages 779-816, July.
    3. Firpo, Sergio Pinheiro & Pinto, Rafael de Carvalho Cayres, 2012. "Combining Strategies for the Estimation of Treatment Effects," Brazilian Review of Econometrics, Sociedade Brasileira de Econometria - SBE, vol. 32(1), March.
    4. Ganesh Karapakula, 2023. "Stable Probability Weighting: Large-Sample and Finite-Sample Estimation and Inference Methods for Heterogeneous Causal Effects of Multivalued Treatments Under Limited Overlap," Papers 2301.05703, arXiv.org, revised Jan 2023.
    5. Su, Liangjun & Ura, Takuya & Zhang, Yichong, 2019. "Non-separable models with high-dimensional data," Journal of Econometrics, Elsevier, vol. 212(2), pages 646-677.
    6. Wei Huang & Oliver Linton & Zheng Zhang, 2021. "A Unified Framework for Specification Tests of Continuous Treatment Effect Models," Papers 2102.08063, arXiv.org, revised Sep 2021.
    7. Difang Huang & Jiti Gao & Tatsushi Oka, 2022. "Semiparametric Single-Index Estimation for Average Treatment Effects," Papers 2206.08503, arXiv.org, revised Oct 2022.
    8. Ai, Chunrong & Linton, Oliver & Zhang, Zheng, 2022. "Estimation and inference for the counterfactual distribution and quantile functions in continuous treatment models," Journal of Econometrics, Elsevier, vol. 228(1), pages 39-61.
    9. Cattaneo, Matias D., 2010. "Efficient semiparametric estimation of multi-valued treatment effects under ignorability," Journal of Econometrics, Elsevier, vol. 155(2), pages 138-154, April.
    10. Victor Chernozhukov & Juan Carlos Escanciano & Hidehiko Ichimura & Whitney K. Newey & James M. Robins, 2022. "Locally Robust Semiparametric Estimation," Econometrica, Econometric Society, vol. 90(4), pages 1501-1535, July.
    11. Farrell, Max H., 2015. "Robust inference on average treatment effects with possibly more covariates than observations," Journal of Econometrics, Elsevier, vol. 189(1), pages 1-23.
    12. Rothe, Christoph, 2016. "The Value of Knowing the Propensity Score for Estimating Average Treatment Effects," IZA Discussion Papers 9989, Institute of Labor Economics (IZA).
    13. Haitian Xie, 2020. "Efficient and Robust Estimation of the Generalized LATE Model," Papers 2001.06746, arXiv.org, revised Feb 2022.
    14. Sant’Anna, Pedro H.C. & Zhao, Jun, 2020. "Doubly robust difference-in-differences estimators," Journal of Econometrics, Elsevier, vol. 219(1), pages 101-122.
    15. Michael C. Knaus, 2021. "A double machine learning approach to estimate the effects of musical practice on student’s skills," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 184(1), pages 282-300, January.
    16. Victor Chernozhukov & Ivan Fernandez-Val & Christian Hansen, 2013. "Program evaluation with high-dimensional data," CeMMAP working papers CWP57/13, Centre for Microdata Methods and Practice, Institute for Fiscal Studies.
    17. Graham, Bryan S. & Pinto, Cristine Campos de Xavier, 2022. "Semiparametrically efficient estimation of the average linear regression function," Journal of Econometrics, Elsevier, vol. 226(1), pages 115-138.
    18. Rahul Singh & Liyuan Xu & Arthur Gretton, 2020. "Kernel Methods for Causal Functions: Dose, Heterogeneous, and Incremental Response Curves," Papers 2010.04855, arXiv.org, revised Oct 2022.
    19. Max H. Farrell, 2013. "Robust Inference on Average Treatment Effects with Possibly More Covariates than Observations," Papers 1309.4686, arXiv.org, revised Feb 2018.
    20. Cui, Li-E & Zhao, Puying & Tang, Niansheng, 2022. "Generalized empirical likelihood for nonsmooth estimating equations with missing data," Journal of Multivariate Analysis, Elsevier, vol. 190(C).

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:cwl:cwldpp:2155. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Brittany Ladd (email available below). General contact details of provider: https://edirc.repec.org/data/cowleus.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.