IDEAS home Printed from https://ideas.repec.org/p/ehl/lserod/127507.html

Testing stationarity and change point detection in reinforcement learning

Author

Listed:
  • Li, Mengbing
  • Shi, Chengchun
  • Wu, Zhenke
  • Fryzlewicz, Piotr

Abstract

We consider reinforcement learning (RL) in possibly nonstationary environments. Many existing RL algorithms in the literature rely on the stationarity assumption that requires the state transition and reward functions to be constant over time. However, this assumption is restrictive in practice and is likely to be violated in a number of applications, including traffic signal control, robotics and mobile health. In this paper, we develop a model-free test to assess the stationarity of the optimal Q-function based on pre-collected historical data, without additional online data collection. Based on the proposed test, we further develop a change point detection method that can be naturally coupled with existing state-of-the-art RL methods designed in stationary environments for online policy optimization in nonstationary environments. The usefulness of our method is illustrated by theoretical results, simulation studies, and a real data example from the 2018 Intern Health Study. A Python implementation of the proposed procedure is publicly available at https://github.com/limengbinggz/CUSUM-RL.

Suggested Citation

  • Li, Mengbing & Shi, Chengchun & Wu, Zhenke & Fryzlewicz, Piotr, 2025. "Testing stationarity and change point detection in reinforcement learning," LSE Research Online Documents on Economics 127507, London School of Economics and Political Science, LSE Library.
  • Handle: RePEc:ehl:lserod:127507
    as

    Download full text from publisher

    File URL: https://researchonline.lse.ac.uk/id/eprint/127507/
    File Function: Open access version.
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Bian, Zeyu & Shi, Chengchun & Qi, Zhengling & Wang, Lan, 2025. "Off-policy evaluation in doubly inhomogeneous environments," LSE Research Online Documents on Economics 124630, London School of Economics and Political Science, LSE Library.
    2. Peng Liao & Predrag Klasnja & Susan Murphy, 2021. "Off-Policy Estimation of Long-Term Average Outcomes With Applications to Mobile Health," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 116(533), pages 382-391, March.
    3. Belloni, Alexandre & Chernozhukov, Victor & Chetverikov, Denis & Kato, Kengo, 2015. "Some new asymptotic theory for least squares series: Pointwise and uniform results," Journal of Econometrics, Elsevier, vol. 186(2), pages 345-366.
    4. Kenneth L. Judd, 1998. "Numerical Methods in Economics," MIT Press Books, The MIT Press, edition 1, volume 1, number 0262100711, December.
    5. Michael P. Wallace & Erica E. M. Moodie, 2015. "Doubly‐robust dynamic treatment regimen estimation via weighted least squares," Biometrics, The International Biometric Society, vol. 71(3), pages 636-644, September.
    6. Zeyu Bian & Chengchun Shi & Zhengling Qi & Lan Wang, 2025. "Off-Policy Evaluation in Doubly Inhomogeneous Environments," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 120(550), pages 1102-1114, April.
    7. Bibhas Chakraborty & Eric B. Laber & Yingqi Zhao, 2013. "Inference for Optimal Dynamic Treatment Regimes Using an Adaptive m-Out-of-n Bootstrap Scheme," Biometrics, The International Biometric Society, vol. 69(3), pages 714-723, September.
    8. Ethan X. Fang & Zhaoran Wang & Lan Wang, 2023. "Fairness-Oriented Learning for Optimal Individualized Treatment Rules," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 118(543), pages 1733-1746, July.
    9. Xinkun Nie & Emma Brunskill & Stefan Wager, 2021. "Learning When-to-Treat Policies," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 116(533), pages 392-409, January.
    10. Mengjia Yu & Xiaohui Chen, 2021. "Finite sample change point inference and identification for high‐dimensional mean vectors," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 83(2), pages 247-270, April.
    11. Zhengling Qi & Dacheng Liu & Haoda Fu & Yufeng Liu, 2020. "Multi-Armed Angle-Based Direct Learning for Estimating Optimal Individualized Treatment Rules With Various Outcomes," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 115(530), pages 678-691, April.
    12. Haeran Cho & Piotr Fryzlewicz, 2015. "Multiple-change-point detection for high dimensional time series via sparsified binary segmentation," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 77(2), pages 475-507, March.
    13. Alquier Pierre & Doukhan Paul & Fan Xiequan, 2019. "Exponential inequalities for nonstationary Markov chains," Dependence Modeling, De Gruyter, vol. 7(1), pages 150-168, January.
    14. Wenzhuo Zhou & Ruoqing Zhu & Annie Qu, 2024. "Estimating Optimal Infinite Horizon Dynamic Treatment Regimes via pT-Learning," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 119(545), pages 625-638, January.
    15. Zhou, Yunzhe & Shi, Chengchun & Li, Lexin & Yao, Qiwei, 2023. "Testing for the Markov property in time series via deep conditional generative learning," LSE Research Online Documents on Economics 119352, London School of Economics and Political Science, LSE Library.
    16. Xiaohong Chen & Timothy M. Christensen, 2018. "Optimal sup‐norm rates and uniform inference on nonlinear functionals of nonparametric IV regression," Quantitative Economics, Econometric Society, vol. 9(1), pages 39-84, March.
    17. Lan Wang & Yu Zhou & Rui Song & Ben Sherwood, 2018. "Quantile-Optimal Treatment Regimes," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 113(523), pages 1243-1254, July.
    18. Yaowu Liu & Jun Xie, 2020. "Cauchy Combination Test: A Powerful Test With Analytic p-Value Calculation Under Arbitrary Dependency Structures," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 115(529), pages 393-402, January.
    19. repec:plo:pcbi00:1006211 is not listed on IDEAS
    20. Chen, Bin & Hong, Yongmiao, 2012. "Testing For The Markov Property In Time Series," Econometric Theory, Cambridge University Press, vol. 28(1), pages 130-178, February.
    21. Cho, Haeran & Fryzlewicz, Piotr, 2015. "Multiple-change-point detection for high dimensional time series via sparsified binary segmentation," LSE Research Online Documents on Economics 57147, London School of Economics and Political Science, LSE Library.
    22. Ying-Qi Zhao & Donglin Zeng & Eric B. Laber & Michael R. Kosorok, 2015. "New Statistical Learning Methods for Estimating Optimal Dynamic Treatment Regimes," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 110(510), pages 583-598, June.
    23. Fryzlewicz, Piotr, 2014. "Wild binary segmentation for multiple change-point detection," LSE Research Online Documents on Economics 57146, London School of Economics and Political Science, LSE Library.
    24. Xinyu Hu & Min Qian & Bin Cheng & Ying Kuen Cheung, 2021. "Personalized Policy Learning Using Longitudinal Mobile Health Data," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 116(533), pages 410-420, March.
    25. Chen, Xiaohong & Christensen, Timothy M., 2015. "Optimal uniform convergence rates and asymptotic normality for series estimators under weak dependence and weak conditions," Journal of Econometrics, Elsevier, vol. 188(2), pages 447-465.
    26. Shi, Chengchun & Zhang, Shengxing & Lu, Wenbin & Song, Rui, 2022. "Statistical inference of the value function for reinforcement learning in infinite-horizon settings," LSE Research Online Documents on Economics 110882, London School of Economics and Political Science, LSE Library.
    27. Xinzhou Guo & Xuming He, 2021. "Inference on Selected Subgroups in Clinical Trials," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 116(535), pages 1498-1506, July.
    28. Chengchun Shi & Sheng Zhang & Wenbin Lu & Rui Song, 2022. "Statistical inference of the value function for reinforcement learning in infinite‐horizon settings," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 84(3), pages 765-793, July.
    29. Ashkan Ertefaie & Robert L Strawderman, 2018. "Constructing dynamic treatment regimes over indefinite time horizons," Biometrika, Biometrika Trust, vol. 105(4), pages 963-977.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Shi, Chengchun & Luo, Shikai & Le, Yuan & Zhu, Hongtu & Song, Rui, 2022. "Statistically efficient advantage learning for offline reinforcement learning in infinite horizons," LSE Research Online Documents on Economics 115598, London School of Economics and Political Science, LSE Library.
    2. Gao, Yuhe & Shi, Chengchun & Song, Rui, 2023. "Deep spectral Q-learning with application to mobile health," LSE Research Online Documents on Economics 119445, London School of Economics and Political Science, LSE Library.
    3. Seonghun Cho & Minsup Shin & Young Hyun Cho & Johan Lim, 2025. "Change point detection in high dimensional covariance matrix using Pillai’s statistics," AStA Advances in Statistical Analysis, Springer;German Statistical Society, vol. 109(1), pages 53-84, March.
    4. Cui, Junfeng & Wang, Guanghui & Zou, Changliang & Wang, Zhaojun, 2023. "Change-point testing for parallel data sets with FDR control," Computational Statistics & Data Analysis, Elsevier, vol. 182(C).
    5. Liu, Bin & Zhang, Xinsheng & Liu, Yufeng, 2022. "High dimensional change point inference: Recent developments and extensions," Journal of Multivariate Analysis, Elsevier, vol. 188(C).
    6. Zhang, Yingying & Shi, Chengchun & Luo, Shikai, 2023. "Conformal off-policy prediction," LSE Research Online Documents on Economics 118250, London School of Economics and Political Science, LSE Library.
    7. Yanqin Fan & Yuan Qi & Gaoqian Xu, 2025. "Policy Learning with $\alpha$-Expected Welfare," Papers 2505.00256, arXiv.org.
    8. Oleksandr Gromenko & Piotr Kokoszka & Matthew Reimherr, 2017. "Detection of change in the spatiotemporal mean function," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 79(1), pages 29-50, January.
    9. Qing Yang & Yu-Ning Li & Yi Zhang, 2020. "Change point detection for nonparametric regression under strongly mixing process," Statistical Papers, Springer, vol. 61(4), pages 1465-1506, August.
    10. Li, Degui, 2024. "Estimation of Large Dynamic Covariance Matrices: A Selective Review," Econometrics and Statistics, Elsevier, vol. 29(C), pages 16-30.
    11. Christoph Breunig & Peter Haan, 2018. "Nonparametric Regression with Selectively Missing Covariates," Papers 1810.00411, arXiv.org, revised Oct 2020.
    12. Lan Luo, By & Shi, Chengchun & Wang, Jitao & Wu, Zhenke & Li, Lexin, 2025. "Multivariate dynamic mediation analysis under a reinforcement learning framework," LSE Research Online Documents on Economics 127112, London School of Economics and Political Science, LSE Library.
    13. Fryzlewicz, Piotr, 2020. "Detecting possibly frequent change-points: Wild Binary Segmentation 2 and steepest-drop model selection," LSE Research Online Documents on Economics 103430, London School of Economics and Political Science, LSE Library.
    14. Haoyu Wei, 2025. "Semiparametric Off-Policy Inference for Optimal Policy Values under Possible Non-Uniqueness," Papers 2505.13809, arXiv.org, revised Jan 2026.
    15. V. Brault & C. Lévy-Leduc & A. Mathieu & A. Jullien, 2018. "Change-Point Estimation in the Multivariate Model Taking into Account the Dependence: Application to the Vegetative Development of Oilseed Rape," Journal of Agricultural, Biological and Environmental Statistics, Springer;The International Biometric Society;American Statistical Association, vol. 23(3), pages 374-389, September.
    16. Li, Jia & Liao, Zhipeng, 2020. "Uniform nonparametric inference for time series," Journal of Econometrics, Elsevier, vol. 219(1), pages 38-51.
    17. Hoshino, Tadao & Yanagi, Takahide, 2023. "Treatment effect models with strategic interaction in treatment decisions," Journal of Econometrics, Elsevier, vol. 236(2).
    18. Shi, Chengchun & Luo, Shikai & Zhu, Hongtu & Song, Rui, 2021. "An online sequential test for qualitative treatment effects," LSE Research Online Documents on Economics 112521, London School of Economics and Political Science, LSE Library.
    19. Breunig, Christoph, 2021. "Varying random coefficient models," Journal of Econometrics, Elsevier, vol. 221(2), pages 381-408.
    20. Chen, Likai & Wang, Weining & Wu, Wei Biao, 2019. "Inference of Break-Points in High-Dimensional Time Series," IRTG 1792 Discussion Papers 2019-013, Humboldt University of Berlin, International Research Training Group 1792 "High Dimensional Nonstationary Time Series".

    More about this item

    Keywords

    ;
    ;
    ;
    ;

    JEL classification:

    • C1 - Mathematical and Quantitative Methods - - Econometric and Statistical Methods and Methodology: General

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:ehl:lserod:127507. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: LSERO Manager (email available below). General contact details of provider: https://edirc.repec.org/data/lsepsuk.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.