IDEAS home Printed from https://ideas.repec.org/p/arx/papers/2302.08854.html
   My bibliography  Save this paper

Post Reinforcement Learning Inference

Author

Listed:
  • Vasilis Syrgkanis
  • Ruohan Zhan

Abstract

We consider estimation and inference using data collected from reinforcement learning algorithms. These algorithms, characterized by their adaptive experimentation, interact with individual units over multiple stages, dynamically adjusting their strategies based on previous interactions. Our goal is to evaluate a counterfactual policy post-data collection and estimate structural parameters, like dynamic treatment effects, which can be used for credit assignment and determining the effect of earlier actions on final outcomes. Such parameters of interest can be framed as solutions to moment equations, but not minimizers of a population loss function, leading to Z-estimation approaches for static data. However, in the adaptive data collection environment of reinforcement learning, where algorithms deploy nonstationary behavior policies, standard estimators do not achieve asymptotic normality due to the fluctuating variance. We propose a weighted Z-estimation approach with carefully designed adaptive weights to stabilize the time-varying estimation variance. We identify proper weighting schemes to restore the consistency and asymptotic normality of the weighted Z-estimators for target parameters, which allows for hypothesis testing and constructing uniform confidence regions. Primary applications include dynamic treatment effect estimation and dynamic off-policy evaluation.

Suggested Citation

  • Vasilis Syrgkanis & Ruohan Zhan, 2023. "Post Reinforcement Learning Inference," Papers 2302.08854, arXiv.org, revised May 2024.
  • Handle: RePEc:arx:papers:2302.08854
    as

    Download full text from publisher

    File URL: http://arxiv.org/pdf/2302.08854
    File Function: Latest version
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Victor Chernozhukov & Juan Carlos Escanciano & Hidehiko Ichimura & Whitney K. Newey & James M. Robins, 2022. "Locally Robust Semiparametric Estimation," Econometrica, Econometric Society, vol. 90(4), pages 1501-1535, July.
    2. Zhan, Ruohan & Hadad, Vitor & Hirshberg, David A. & Athey, Susan, 2021. "Off-Policy Evaluation via Adaptive Weighting with Data from Contextual Bandits," Research Papers 3970, Stanford University, Graduate School of Business.
    3. Judith J. Lok & Victor DeGruttola, 2012. "Impact of Time to Start Treatment Following Infection with Application to Initiating HAART in HIV-Positive Patients," Biometrics, The International Biometric Society, vol. 68(3), pages 745-754, September.
    4. S. A. Murphy, 2003. "Optimal dynamic treatment regimes," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 65(2), pages 331-355, May.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Rahul Singh & Liyuan Xu & Arthur Gretton, 2020. "Kernel Methods for Causal Functions: Dose, Heterogeneous, and Incremental Response Curves," Papers 2010.04855, arXiv.org, revised Oct 2022.
    2. Rahul Singh & Liyuan Xu & Arthur Gretton, 2021. "Sequential Kernel Embedding for Mediated and Time-Varying Dose Response Curves," Papers 2111.03950, arXiv.org, revised Jul 2023.
    3. Xin Chen & Rui Song & Jiajia Zhang & Swann Arp Adams & Liuquan Sun & Wenbin Lu, 2022. "On estimating optimal regime for treatment initiation time based on restricted mean residual lifetime," Biometrics, The International Biometric Society, vol. 78(4), pages 1377-1389, December.
    4. Ruohan Zhan & Zhimei Ren & Susan Athey & Zhengyuan Zhou, 2021. "Policy Learning with Adaptively Collected Data," Papers 2105.02344, arXiv.org, revised Nov 2022.
    5. Keith Battocchi & Eleanor Dillon & Maggie Hei & Greg Lewis & Miruna Oprescu & Vasilis Syrgkanis, 2021. "Estimating the Long-Term Effects of Novel Treatments," Papers 2103.08390, arXiv.org, revised Feb 2022.
    6. Q. Clairon & R. Henderson & N. J. Young & E. D. Wilson & C. J. Taylor, 2021. "Adaptive treatment and robust control," Biometrics, The International Biometric Society, vol. 77(1), pages 223-236, March.
    7. Kyle Colangelo & Ying-Ying Lee, 2019. "Double debiased machine learning nonparametric inference with continuous treatments," CeMMAP working papers CWP72/19, Centre for Microdata Methods and Practice, Institute for Fiscal Studies.
    8. Jin Wang & Donglin Zeng & D. Y. Lin, 2022. "Semiparametric single-index models for optimal treatment regimens with censored outcomes," Lifetime Data Analysis: An International Journal Devoted to Statistical Methods and Applications for Time-to-Event Data, Springer, vol. 28(4), pages 744-763, October.
    9. Shonosuke Sugasawa & Hisashi Noma, 2021. "Efficient screening of predictive biomarkers for individual treatment selection," Biometrics, The International Biometric Society, vol. 77(1), pages 249-257, March.
    10. Jingxiang Chen & Yufeng Liu & Donglin Zeng & Rui Song & Yingqi Zhao & Michael R. Kosorok, 2016. "Comment," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 111(515), pages 942-947, July.
    11. Jelena Bradic & Weijie Ji & Yuqian Zhang, 2021. "High-dimensional Inference for Dynamic Treatment Effects," Papers 2110.04924, arXiv.org, revised May 2023.
    12. Han, Sukjin, 2021. "Identification in nonparametric models for dynamic treatment effects," Journal of Econometrics, Elsevier, vol. 225(2), pages 132-147.
    13. Durlauf, Steven N. & Navarro, Salvador & Rivers, David A., 2016. "Model uncertainty and the effect of shall-issue right-to-carry laws on crime," European Economic Review, Elsevier, vol. 81(C), pages 32-67.
    14. Michael C Knaus & Michael Lechner & Anthony Strittmatter, 2021. "Machine learning estimation of heterogeneous causal effects: Empirical Monte Carlo evidence," The Econometrics Journal, Royal Economic Society, vol. 24(1), pages 134-161.
    15. Kyle Colangelo & Ying-Ying Lee, 2019. "Double debiased machine learning nonparametric inference with continuous treatments," CeMMAP working papers CWP54/19, Centre for Microdata Methods and Practice, Institute for Fiscal Studies.
    16. Yufan Zhao & Donglin Zeng & Mark A. Socinski & Michael R. Kosorok, 2011. "Reinforcement Learning Strategies for Clinical Trials in Nonsmall Cell Lung Cancer," Biometrics, The International Biometric Society, vol. 67(4), pages 1422-1433, December.
    17. Luo, Yu & Graham, Daniel J. & McCoy, Emma J., 2023. "Semiparametric Bayesian doubly robust causal estimation," LSE Research Online Documents on Economics 117944, London School of Economics and Political Science, LSE Library.
    18. Victor Chernozhukov & Whitney K. Newey & Victor Quintas-Martinez & Vasilis Syrgkanis, 2021. "Automatic Debiased Machine Learning via Riesz Regression," Papers 2104.14737, arXiv.org, revised Mar 2024.
    19. Zhengyuan Zhou & Susan Athey & Stefan Wager, 2023. "Offline Multi-Action Policy Learning: Generalization and Optimization," Operations Research, INFORMS, vol. 71(1), pages 148-183, January.
    20. Anders Bredahl Kock & Martin Thyrsgaard, 2017. "Optimal sequential treatment allocation," Papers 1705.09952, arXiv.org, revised Aug 2018.

    More about this item

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:arx:papers:2302.08854. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: arXiv administrators (email available below). General contact details of provider: http://arxiv.org/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.