IDEAS home Printed from https://ideas.repec.org/p/ehl/lserod/127507.html
   My bibliography  Save this paper

Testing stationarity and change point detection in reinforcement learning

Author

Listed:
  • Li, Mengbing
  • Shi, Chengchun
  • Wu, Zhenke
  • Fryzlewicz, Piotr

Abstract

We consider reinforcement learning (RL) in possibly nonstationary environments. Many existing RL algorithms in the literature rely on the stationarity assumption that requires the state transition and reward functions to be constant over time. However, this assumption is restrictive in practice and is likely to be violated in a number of applications, including traffic signal control, robotics and mobile health. In this paper, we develop a model-free test to assess the stationarity of the optimal Q-function based on pre-collected historical data, without additional online data collection. Based on the proposed test, we further develop a change point detection method that can be naturally coupled with existing state-of-the-art RL methods designed in stationary environments for online policy optimization in nonstationary environments. The usefulness of our method is illustrated by theoretical results, simulation studies, and a real data example from the 2018 Intern Health Study. A Python implementation of the proposed procedure is publicly available at https://github.com/limengbinggz/CUSUM-RL.

Suggested Citation

  • Li, Mengbing & Shi, Chengchun & Wu, Zhenke & Fryzlewicz, Piotr, 2025. "Testing stationarity and change point detection in reinforcement learning," LSE Research Online Documents on Economics 127507, London School of Economics and Political Science, LSE Library.
  • Handle: RePEc:ehl:lserod:127507
    as

    Download full text from publisher

    File URL: http://eprints.lse.ac.uk/127507/
    File Function: Open access version.
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Bian, Zeyu & Shi, Chengchun & Qi, Zhengling & Wang, Lan, 2025. "Off-policy evaluation in doubly inhomogeneous environments," LSE Research Online Documents on Economics 124630, London School of Economics and Political Science, LSE Library.
    2. Bibhas Chakraborty & Eric B. Laber & Yingqi Zhao, 2013. "Inference for Optimal Dynamic Treatment Regimes Using an Adaptive m-Out-of-n Bootstrap Scheme," Biometrics, The International Biometric Society, vol. 69(3), pages 714-723, September.
    3. Xiaohong Chen & Timothy M. Christensen, 2018. "Optimal sup‐norm rates and uniform inference on nonlinear functionals of nonparametric IV regression," Quantitative Economics, Econometric Society, vol. 9(1), pages 39-84, March.
    4. Yaowu Liu & Jun Xie, 2020. "Cauchy Combination Test: A Powerful Test With Analytic p-Value Calculation Under Arbitrary Dependency Structures," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 115(529), pages 393-402, January.
    5. Chen, Bin & Hong, Yongmiao, 2012. "Testing For The Markov Property In Time Series," Econometric Theory, Cambridge University Press, vol. 28(1), pages 130-178, February.
    6. Chen, Xiaohong & Christensen, Timothy M., 2015. "Optimal uniform convergence rates and asymptotic normality for series estimators under weak dependence and weak conditions," Journal of Econometrics, Elsevier, vol. 188(2), pages 447-465.
    7. Xinyu Hu & Min Qian & Bin Cheng & Ying Kuen Cheung, 2021. "Personalized Policy Learning Using Longitudinal Mobile Health Data," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 116(533), pages 410-420, March.
    8. Belloni, Alexandre & Chernozhukov, Victor & Chetverikov, Denis & Kato, Kengo, 2015. "Some new asymptotic theory for least squares series: Pointwise and uniform results," Journal of Econometrics, Elsevier, vol. 186(2), pages 345-366.
    9. Michael P. Wallace & Erica E. M. Moodie, 2015. "Doubly‐robust dynamic treatment regimen estimation via weighted least squares," Biometrics, The International Biometric Society, vol. 71(3), pages 636-644, September.
    10. repec:plo:pcbi00:1006211 is not listed on IDEAS
    11. Cho, Haeran & Fryzlewicz, Piotr, 2015. "Multiple-change-point detection for high dimensional time series via sparsified binary segmentation," LSE Research Online Documents on Economics 57147, London School of Economics and Political Science, LSE Library.
    12. Xinzhou Guo & Xuming He, 2021. "Inference on Selected Subgroups in Clinical Trials," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 116(535), pages 1498-1506, July.
    13. Ashkan Ertefaie & Robert L Strawderman, 2018. "Constructing dynamic treatment regimes over indefinite time horizons," Biometrika, Biometrika Trust, vol. 105(4), pages 963-977.
    14. Kenneth L. Judd, 1998. "Numerical Methods in Economics," MIT Press Books, The MIT Press, edition 1, volume 1, number 0262100711, December.
    15. Xinkun Nie & Emma Brunskill & Stefan Wager, 2021. "Learning When-to-Treat Policies," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 116(533), pages 392-409, January.
    16. Mengjia Yu & Xiaohui Chen, 2021. "Finite sample change point inference and identification for high‐dimensional mean vectors," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 83(2), pages 247-270, April.
    17. Zhengling Qi & Dacheng Liu & Haoda Fu & Yufeng Liu, 2020. "Multi-Armed Angle-Based Direct Learning for Estimating Optimal Individualized Treatment Rules With Various Outcomes," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 115(530), pages 678-691, April.
    18. Haeran Cho & Piotr Fryzlewicz, 2015. "Multiple-change-point detection for high dimensional time series via sparsified binary segmentation," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 77(2), pages 475-507, March.
    19. Lan Wang & Yu Zhou & Rui Song & Ben Sherwood, 2018. "Quantile-Optimal Treatment Regimes," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 113(523), pages 1243-1254, July.
    20. Ying-Qi Zhao & Donglin Zeng & Eric B. Laber & Michael R. Kosorok, 2015. "New Statistical Learning Methods for Estimating Optimal Dynamic Treatment Regimes," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 110(510), pages 583-598, June.
    21. Peng Liao & Predrag Klasnja & Susan Murphy, 2021. "Off-Policy Estimation of Long-Term Average Outcomes With Applications to Mobile Health," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 116(533), pages 382-391, March.
    22. Zeyu Bian & Chengchun Shi & Zhengling Qi & Lan Wang, 2025. "Off-Policy Evaluation in Doubly Inhomogeneous Environments," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 120(550), pages 1102-1114, April.
    23. Ethan X. Fang & Zhaoran Wang & Lan Wang, 2023. "Fairness-Oriented Learning for Optimal Individualized Treatment Rules," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 118(543), pages 1733-1746, July.
    24. Alquier Pierre & Doukhan Paul & Fan Xiequan, 2019. "Exponential inequalities for nonstationary Markov chains," Dependence Modeling, De Gruyter, vol. 7(1), pages 150-168, January.
    25. Zhou, Yunzhe & Shi, Chengchun & Li, Lexin & Yao, Qiwei, 2023. "Testing for the Markov property in time series via deep conditional generative learning," LSE Research Online Documents on Economics 119352, London School of Economics and Political Science, LSE Library.
    26. Wenzhuo Zhou & Ruoqing Zhu & Annie Qu, 2024. "Estimating Optimal Infinite Horizon Dynamic Treatment Regimes via pT-Learning," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 119(545), pages 625-638, January.
    27. Fryzlewicz, Piotr, 2014. "Wild binary segmentation for multiple change-point detection," LSE Research Online Documents on Economics 57146, London School of Economics and Political Science, LSE Library.
    28. Shi, Chengchun & Zhang, Shengxing & Lu, Wenbin & Song, Rui, 2022. "Statistical inference of the value function for reinforcement learning in infinite-horizon settings," LSE Research Online Documents on Economics 110882, London School of Economics and Political Science, LSE Library.
    29. Chengchun Shi & Sheng Zhang & Wenbin Lu & Rui Song, 2022. "Statistical inference of the value function for reinforcement learning in infinite‐horizon settings," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 84(3), pages 765-793, July.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Gao, Yuhe & Shi, Chengchun & Song, Rui, 2023. "Deep spectral Q-learning with application to mobile health," LSE Research Online Documents on Economics 119445, London School of Economics and Political Science, LSE Library.
    2. Shi, Chengchun & Luo, Shikai & Le, Yuan & Zhu, Hongtu & Song, Rui, 2022. "Statistically efficient advantage learning for offline reinforcement learning in infinite horizons," LSE Research Online Documents on Economics 115598, London School of Economics and Political Science, LSE Library.
    3. Zhang, Yingying & Shi, Chengchun & Luo, Shikai, 2023. "Conformal off-policy prediction," LSE Research Online Documents on Economics 118250, London School of Economics and Political Science, LSE Library.
    4. Seonghun Cho & Minsup Shin & Young Hyun Cho & Johan Lim, 2025. "Change point detection in high dimensional covariance matrix using Pillai’s statistics," AStA Advances in Statistical Analysis, Springer;German Statistical Society, vol. 109(1), pages 53-84, March.
    5. Cui, Junfeng & Wang, Guanghui & Zou, Changliang & Wang, Zhaojun, 2023. "Change-point testing for parallel data sets with FDR control," Computational Statistics & Data Analysis, Elsevier, vol. 182(C).
    6. Liu, Bin & Zhang, Xinsheng & Liu, Yufeng, 2022. "High dimensional change point inference: Recent developments and extensions," Journal of Multivariate Analysis, Elsevier, vol. 188(C).
    7. Yanqin Fan & Yuan Qi & Gaoqian Xu, 2025. "Policy Learning with $\alpha$-Expected Welfare," Papers 2505.00256, arXiv.org.
    8. Qing Yang & Yu-Ning Li & Yi Zhang, 2020. "Change point detection for nonparametric regression under strongly mixing process," Statistical Papers, Springer, vol. 61(4), pages 1465-1506, August.
    9. Li, Degui, 2024. "Estimation of Large Dynamic Covariance Matrices: A Selective Review," Econometrics and Statistics, Elsevier, vol. 29(C), pages 16-30.
    10. Chen, Likai & Wang, Weining & Wu, Wei Biao, 2019. "Inference of Break-Points in High-Dimensional Time Series," IRTG 1792 Discussion Papers 2019-013, Humboldt University of Berlin, International Research Training Group 1792 "High Dimensional Nonstationary Time Series".
    11. Haoyu Wei, 2025. "Characterization of Efficient Influence Function for Off-Policy Evaluation Under Optimal Policies," Papers 2505.13809, arXiv.org, revised Jun 2025.
    12. Breunig, Christoph, 2021. "Varying random coefficient models," Journal of Econometrics, Elsevier, vol. 221(2), pages 381-408.
    13. Cho, Haeran & Kirch, Claudia, 2024. "Data segmentation algorithms: Univariate mean change and beyond," Econometrics and Statistics, Elsevier, vol. 30(C), pages 76-95.
    14. Mengjia Yu & Xiaohui Chen, 2021. "Finite sample change point inference and identification for high‐dimensional mean vectors," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 83(2), pages 247-270, April.
    15. Hajra Siddiqa & Sajid Ali & Ismail Shah, 2021. "Most recent changepoint detection in censored panel data," Computational Statistics, Springer, vol. 36(1), pages 515-540, March.
    16. Barigozzi, Matteo & Cho, Haeran & Fryzlewicz, Piotr, 2018. "Simultaneous multiple change-point and factor analysis for high-dimensional time series," Journal of Econometrics, Elsevier, vol. 206(1), pages 187-225.
    17. Breunig, Christoph & Haan, Peter, 2021. "Nonparametric regression with selectively missing covariates," Journal of Econometrics, Elsevier, vol. 223(1), pages 28-52.
    18. Zhen Li & Jie Chen & Eric Laber & Fang Liu & Richard Baumgartner, 2023. "Optimal Treatment Regimes: A Review and Empirical Comparison," International Statistical Review, International Statistical Institute, vol. 91(3), pages 427-463, December.
    19. Samuele Centorrino & Aman Ullah & Jing Xue, 2019. "Semiparametric Estimation of Correlated Random Coefficient Models without Instrumental Variables," Papers 1911.06857, arXiv.org.
    20. Zhu, Jin & Wan, Runzhe & Qi, Zhengling & Luo, Shikai & Shi, Chengchun, 2024. "Robust offline reinforcement learning with heavy-tailed rewards," LSE Research Online Documents on Economics 122740, London School of Economics and Political Science, LSE Library.

    More about this item

    Keywords

    ;
    ;
    ;
    ;

    JEL classification:

    • C1 - Mathematical and Quantitative Methods - - Econometric and Statistical Methods and Methodology: General

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:ehl:lserod:127507. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: LSERO Manager (email available below). General contact details of provider: https://edirc.repec.org/data/lsepsuk.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.