IDEAS home Printed from https://ideas.repec.org/p/arx/papers/2302.08854.html
   My bibliography  Save this paper

Post-Episodic Reinforcement Learning Inference

Author

Listed:
  • Vasilis Syrgkanis
  • Ruohan Zhan

Abstract

We consider estimation and inference with data collected from episodic reinforcement learning (RL) algorithms; i.e. adaptive experimentation algorithms that at each period (aka episode) interact multiple times in a sequential manner with a single treated unit. Our goal is to be able to evaluate counterfactual adaptive policies after data collection and to estimate structural parameters such as dynamic treatment effects, which can be used for credit assignment (e.g. what was the effect of the first period action on the final outcome). Such parameters of interest can be framed as solutions to moment equations, but not minimizers of a population loss function, leading to $Z$-estimation approaches in the case of static data. However, such estimators fail to be asymptotically normal in the case of adaptive data collection. We propose a re-weighted $Z$-estimation approach with carefully designed adaptive weights to stabilize the episode-varying estimation variance, which results from the nonstationary policy that typical episodic RL algorithms invoke. We identify proper weighting schemes to restore the consistency and asymptotic normality of the re-weighted Z-estimators for target parameters, which allows for hypothesis testing and constructing uniform confidence regions for target parameters of interest. Primary applications include dynamic treatment effect estimation and dynamic off-policy evaluation.

Suggested Citation

  • Vasilis Syrgkanis & Ruohan Zhan, 2023. "Post-Episodic Reinforcement Learning Inference," Papers 2302.08854, arXiv.org, revised Jul 2023.
  • Handle: RePEc:arx:papers:2302.08854
    as

    Download full text from publisher

    File URL: http://arxiv.org/pdf/2302.08854
    File Function: Latest version
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Victor Chernozhukov & Juan Carlos Escanciano & Hidehiko Ichimura & Whitney K. Newey & James M. Robins, 2022. "Locally Robust Semiparametric Estimation," Econometrica, Econometric Society, vol. 90(4), pages 1501-1535, July.
    2. Zhan, Ruohan & Hadad, Vitor & Hirshberg, David A. & Athey, Susan, 2021. "Off-Policy Evaluation via Adaptive Weighting with Data from Contextual Bandits," Research Papers 3970, Stanford University, Graduate School of Business.
    3. Judith J. Lok & Victor DeGruttola, 2012. "Impact of Time to Start Treatment Following Infection with Application to Initiating HAART in HIV-Positive Patients," Biometrics, The International Biometric Society, vol. 68(3), pages 745-754, September.
    4. S. A. Murphy, 2003. "Optimal dynamic treatment regimes," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 65(2), pages 331-355, May.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Rahul Singh & Liyuan Xu & Arthur Gretton, 2020. "Kernel Methods for Causal Functions: Dose, Heterogeneous, and Incremental Response Curves," Papers 2010.04855, arXiv.org, revised Oct 2022.
    2. Rahul Singh & Liyuan Xu & Arthur Gretton, 2021. "Sequential Kernel Embedding for Mediated and Time-Varying Dose Response Curves," Papers 2111.03950, arXiv.org, revised Jul 2023.
    3. Xin Chen & Rui Song & Jiajia Zhang & Swann Arp Adams & Liuquan Sun & Wenbin Lu, 2022. "On estimating optimal regime for treatment initiation time based on restricted mean residual lifetime," Biometrics, The International Biometric Society, vol. 78(4), pages 1377-1389, December.
    4. Ruohan Zhan & Zhimei Ren & Susan Athey & Zhengyuan Zhou, 2021. "Policy Learning with Adaptively Collected Data," Papers 2105.02344, arXiv.org, revised Nov 2022.
    5. Keith Battocchi & Eleanor Dillon & Maggie Hei & Greg Lewis & Miruna Oprescu & Vasilis Syrgkanis, 2021. "Estimating the Long-Term Effects of Novel Treatments," Papers 2103.08390, arXiv.org, revised Feb 2022.
    6. Q. Clairon & R. Henderson & N. J. Young & E. D. Wilson & C. J. Taylor, 2021. "Adaptive treatment and robust control," Biometrics, The International Biometric Society, vol. 77(1), pages 223-236, March.
    7. Kyle Colangelo & Ying-Ying Lee, 2019. "Double debiased machine learning nonparametric inference with continuous treatments," CeMMAP working papers CWP72/19, Centre for Microdata Methods and Practice, Institute for Fiscal Studies.
    8. Jin Wang & Donglin Zeng & D. Y. Lin, 2022. "Semiparametric single-index models for optimal treatment regimens with censored outcomes," Lifetime Data Analysis: An International Journal Devoted to Statistical Methods and Applications for Time-to-Event Data, Springer, vol. 28(4), pages 744-763, October.
    9. Shonosuke Sugasawa & Hisashi Noma, 2021. "Efficient screening of predictive biomarkers for individual treatment selection," Biometrics, The International Biometric Society, vol. 77(1), pages 249-257, March.
    10. Jingxiang Chen & Yufeng Liu & Donglin Zeng & Rui Song & Yingqi Zhao & Michael R. Kosorok, 2016. "Comment," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 111(515), pages 942-947, July.
    11. Jelena Bradic & Weijie Ji & Yuqian Zhang, 2021. "High-dimensional Inference for Dynamic Treatment Effects," Papers 2110.04924, arXiv.org, revised May 2023.
    12. Han, Sukjin, 2021. "Identification in nonparametric models for dynamic treatment effects," Journal of Econometrics, Elsevier, vol. 225(2), pages 132-147.
    13. Durlauf, Steven N. & Navarro, Salvador & Rivers, David A., 2016. "Model uncertainty and the effect of shall-issue right-to-carry laws on crime," European Economic Review, Elsevier, vol. 81(C), pages 32-67.
    14. Michael C Knaus & Michael Lechner & Anthony Strittmatter, 2021. "Machine learning estimation of heterogeneous causal effects: Empirical Monte Carlo evidence," The Econometrics Journal, Royal Economic Society, vol. 24(1), pages 134-161.
    15. Kyle Colangelo & Ying-Ying Lee, 2019. "Double debiased machine learning nonparametric inference with continuous treatments," CeMMAP working papers CWP54/19, Centre for Microdata Methods and Practice, Institute for Fiscal Studies.
    16. Yufan Zhao & Donglin Zeng & Mark A. Socinski & Michael R. Kosorok, 2011. "Reinforcement Learning Strategies for Clinical Trials in Nonsmall Cell Lung Cancer," Biometrics, The International Biometric Society, vol. 67(4), pages 1422-1433, December.
    17. Luo, Yu & Graham, Daniel J. & McCoy, Emma J., 2023. "Semiparametric Bayesian doubly robust causal estimation," LSE Research Online Documents on Economics 117944, London School of Economics and Political Science, LSE Library.
    18. Victor Chernozhukov & Whitney K. Newey & Victor Quintas-Martinez & Vasilis Syrgkanis, 2021. "Automatic Debiased Machine Learning via Riesz Regression," Papers 2104.14737, arXiv.org, revised Mar 2024.
    19. Zhengyuan Zhou & Susan Athey & Stefan Wager, 2023. "Offline Multi-Action Policy Learning: Generalization and Optimization," Operations Research, INFORMS, vol. 71(1), pages 148-183, January.
    20. Anders Bredahl Kock & Martin Thyrsgaard, 2017. "Optimal sequential treatment allocation," Papers 1705.09952, arXiv.org, revised Aug 2018.

    More about this item

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:arx:papers:2302.08854. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: arXiv administrators (email available below). General contact details of provider: http://arxiv.org/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.