IDEAS home Printed from https://ideas.repec.org/p/arx/papers/2010.13554.html
   My bibliography  Save this paper

Off-Policy Evaluation of Bandit Algorithm from Dependent Samples under Batch Update Policy

Author

Listed:
  • Masahiro Kato
  • Yusuke Kaneko

Abstract

The goal of off-policy evaluation (OPE) is to evaluate a new policy using historical data obtained via a behavior policy. However, because the contextual bandit algorithm updates the policy based on past observations, the samples are not independent and identically distributed (i.i.d.). This paper tackles this problem by constructing an estimator from a martingale difference sequence (MDS) for the dependent samples. In the data-generating process, we do not assume the convergence of the policy, but the policy uses the same conditional probability of choosing an action during a certain period. Then, we derive an asymptotically normal estimator of the value of an evaluation policy. As another advantage of our method, the batch-based approach simultaneously solves the deficient support problem. Using benchmark and real-world datasets, we experimentally confirm the effectiveness of the proposed method.

Suggested Citation

  • Masahiro Kato & Yusuke Kaneko, 2020. "Off-Policy Evaluation of Bandit Algorithm from Dependent Samples under Batch Update Policy," Papers 2010.13554, arXiv.org.
  • Handle: RePEc:arx:papers:2010.13554
    as

    Download full text from publisher

    File URL: http://arxiv.org/pdf/2010.13554
    File Function: Latest version
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Jinyong Hahn & Keisuke Hirano & Dean Karlan, 2011. "Adaptive Experimental Design Using the Propensity Score," Journal of Business & Economic Statistics, Taylor & Francis Journals, vol. 29(1), pages 96-108, January.
    2. Victor Chernozhukov & Denis Chetverikov & Mert Demirer & Esther Duflo & Christian Hansen & Whitney Newey & James Robins, 2018. "Double/debiased machine learning for treatment and structural parameters," Econometrics Journal, Royal Economic Society, vol. 21(1), pages 1-68, February.
    3. Keisuke Hirano & Guido W. Imbens & Geert Ridder, 2003. "Efficient Estimation of Average Treatment Effects Using the Estimated Propensity Score," Econometrica, Econometric Society, vol. 71(4), pages 1161-1189, July.
    4. Yusuke Narita & Shota Yasui & Kohei Yata, 2018. "Efficient Counterfactual Learning from Bandit Feedback," Cowles Foundation Discussion Papers 2155, Cowles Foundation for Research in Economics, Yale University.
    5. Guido W. Imbens, 2004. "Nonparametric Estimation of Average Treatment Effects Under Exogeneity: A Review," The Review of Economics and Statistics, MIT Press, vol. 86(1), pages 4-29, February.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Masahiro Kato, 2021. "Adaptive Doubly Robust Estimator from Non-stationary Logging Policy under a Convergence of Average Probability," Papers 2102.08975, arXiv.org, revised Mar 2021.
    2. Davide Viviano & Jess Rudder, 2020. "Policy design in experiments with unknown interference," Papers 2011.08174, arXiv.org, revised Dec 2023.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Masahiro Kato, 2021. "Adaptive Doubly Robust Estimator from Non-stationary Logging Policy under a Convergence of Average Probability," Papers 2102.08975, arXiv.org, revised Mar 2021.
    2. Masahiro Kato, 2020. "Confidence Interval for Off-Policy Evaluation from Dependent Samples via Bandit Algorithm: Approach from Standardized Martingales," Papers 2006.06982, arXiv.org.
    3. Alexandre Belloni & Victor Chernozhukov & Denis Chetverikov & Christian Hansen & Kengo Kato, 2018. "High-dimensional econometrics and regularized GMM," CeMMAP working papers CWP35/18, Centre for Microdata Methods and Practice, Institute for Fiscal Studies.
    4. Chunrong Ai & Oliver Linton & Kaiji Motegi & Zheng Zhang, 2021. "A unified framework for efficient estimation of general treatment models," Quantitative Economics, Econometric Society, vol. 12(3), pages 779-816, July.
    5. Huber Martin & Wüthrich Kaspar, 2019. "Local Average and Quantile Treatment Effects Under Endogeneity: A Review," Journal of Econometric Methods, De Gruyter, vol. 8(1), pages 1-27, January.
    6. Masahiro Kato & Kenshi Abe & Kaito Ariu & Shota Yasui, 2020. "A Practical Guide of Off-Policy Evaluation for Bandit Problems," Papers 2010.12470, arXiv.org.
    7. Yihui He & Fang Han, 2023. "On propensity score matching with a diverging number of matches," Papers 2310.14142, arXiv.org, revised Nov 2023.
    8. Masahiro Kato & Shota Yasui & Kenichiro McAlinn, 2020. "The Adaptive Doubly Robust Estimator for Policy Evaluation in Adaptive Experiments and a Paradox Concerning Logging Policy," Papers 2010.03792, arXiv.org, revised Jun 2021.
    9. Rahul Singh & Liyuan Xu & Arthur Gretton, 2020. "Kernel Methods for Causal Functions: Dose, Heterogeneous, and Incremental Response Curves," Papers 2010.04855, arXiv.org, revised Oct 2022.
    10. Martin Huber & Jannis Kueck, 2022. "Testing the identification of causal effects in observational data," Papers 2203.15890, arXiv.org, revised Jun 2023.
    11. Huber, Martin, 2019. "An introduction to flexible methods for policy evaluation," FSES Working Papers 504, Faculty of Economics and Social Sciences, University of Freiburg/Fribourg Switzerland.
    12. Ganesh Karapakula, 2023. "Stable Probability Weighting: Large-Sample and Finite-Sample Estimation and Inference Methods for Heterogeneous Causal Effects of Multivalued Treatments Under Limited Overlap," Papers 2301.05703, arXiv.org, revised Jan 2023.
    13. Taisuke Otsu & Mengshan Xu, 2022. "Isotonic propensity score matching," STICERD - Econometrics Paper Series 623, Suntory and Toyota International Centres for Economics and Related Disciplines, LSE.
    14. Dongcheng Zhang & Kunpeng Zhang, 2020. "Weighting-Based Treatment Effect Estimation via Distribution Learning," Papers 2012.13805, arXiv.org, revised May 2023.
    15. Masahiro Kato & Masatoshi Uehara & Shota Yasui, 2020. "Off-Policy Evaluation and Learning for External Validity under a Covariate Shift," Papers 2002.11642, arXiv.org, revised Oct 2020.
    16. Yuehao Bai & Jizhou Liu & Azeem M. Shaikh & Max Tabord-Meehan, 2023. "On the Efficiency of Finely Stratified Experiments," Papers 2307.15181, arXiv.org, revised Feb 2024.
    17. Jikai Jin & Vasilis Syrgkanis, 2024. "Structure-agnostic Optimality of Doubly Robust Learning for Treatment Effect Estimation," Papers 2402.14264, arXiv.org, revised Mar 2024.
    18. Susan Athey & Guido W. Imbens & Jonas Metzger & Evan M. Munro, 2019. "Using Wasserstein Generative Adversarial Networks for the Design of Monte Carlo Simulations," NBER Working Papers 26566, National Bureau of Economic Research, Inc.
    19. Mengshan Xu & Taisuke Otsu, 2022. "Isotonic propensity score matching," Papers 2207.08868, arXiv.org.
    20. Zhexiao Lin & Fang Han, 2022. "On regression-adjusted imputation estimators of the average treatment effect," Papers 2212.05424, arXiv.org, revised Jan 2023.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:arx:papers:2010.13554. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: arXiv administrators (email available below). General contact details of provider: http://arxiv.org/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.