IDEAS home Printed from https://ideas.repec.org/p/arx/papers/2505.13809.html

Semiparametric Off-Policy Inference for Optimal Policy Values under Possible Non-Uniqueness

Author

Listed:
  • Haoyu Wei

Abstract

Off-policy evaluation (OPE) constructs confidence intervals for the value of a target policy using data generated under a different behavior policy. Most existing inference methods focus on fixed target policies and may fail when the target policy is estimated as optimal, particularly when the optimal policy is non-unique or nearly deterministic. We study inference for the value of optimal policies in Markov decision processes. We characterize the existence of the efficient influence function and show that non-regularity arises under policy non-uniqueness. Motivated by this analysis, we propose a novel \textit{N}onparametric \textit{S}equenti\textit{A}l \textit{V}alue \textit{E}valuation (NSAVE) method, which achieves semiparametric efficiency and retains the double robustness property when the optimal policy is unique, and remains stable in degenerate regimes beyond the scope of existing asymptotic theory. We further develop a smoothing-based approach for valid inference under non-unique optimal policies, and a post-selection procedure with uniform coverage for data-selected optimal policies. Simulation studies support the theoretical results. An application to the OhioT1DM mobile health dataset provides patient-specific confidence intervals for optimal policy values and their improvement over observed treatment policies.

Suggested Citation

  • Haoyu Wei, 2025. "Semiparametric Off-Policy Inference for Optimal Policy Values under Possible Non-Uniqueness," Papers 2505.13809, arXiv.org, revised Jan 2026.
  • Handle: RePEc:arx:papers:2505.13809
    as

    Download full text from publisher

    File URL: http://arxiv.org/pdf/2505.13809
    File Function: Latest version
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Chengchun Shi & Jin Zhu & Shen Ye & Shikai Luo & Hongtu Zhu & Rui Song, 2024. "Off-Policy Confidence Interval Estimation with Confounded Markov Decision Process," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 119(545), pages 273-284, January.
    2. Bian, Zeyu & Shi, Chengchun & Qi, Zhengling & Wang, Lan, 2025. "Off-policy evaluation in doubly inhomogeneous environments," LSE Research Online Documents on Economics 124630, London School of Economics and Political Science, LSE Library.
    3. Chengchun Shi & Zhengling Qi & Jianing Wang & Fan Zhou, 2024. "Value Enhancement of Reinforcement Learning via Efficient and Robust Trust Region Optimization," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 119(547), pages 2011-2025, July.
    4. Luo, Shikai & Yang, Ying & Shi, Chengchun & Yao, Fang & Ye, Jieping & Zhu, Hongtu, 2024. "Policy evaluation for temporal and/or spatial dependent experiments," LSE Research Online Documents on Economics 122741, London School of Economics and Political Science, LSE Library.
    5. Shi, Chengchun & Zhang, Shengxing & Lu, Wenbin & Song, Rui, 2022. "Statistical inference of the value function for reinforcement learning in infinite-horizon settings," LSE Research Online Documents on Economics 110882, London School of Economics and Political Science, LSE Library.
    6. Susan Athey & Stefan Wager, 2021. "Policy Learning With Observational Data," Econometrica, Econometric Society, vol. 89(1), pages 133-161, January.
    7. Chengchun Shi & Sheng Zhang & Wenbin Lu & Rui Song, 2022. "Statistical inference of the value function for reinforcement learning in infinite‐horizon settings," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 84(3), pages 765-793, July.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Li, Mengbing & Shi, Chengchun & Wu, Zhenke & Fryzlewicz, Piotr, 2025. "Testing stationarity and change point detection in reinforcement learning," LSE Research Online Documents on Economics 127507, London School of Economics and Political Science, LSE Library.
    2. Lan Luo, By & Shi, Chengchun & Wang, Jitao & Wu, Zhenke & Li, Lexin, 2025. "Multivariate dynamic mediation analysis under a reinforcement learning framework," LSE Research Online Documents on Economics 127112, London School of Economics and Political Science, LSE Library.
    3. Zhang, Yingying & Shi, Chengchun & Luo, Shikai, 2023. "Conformal off-policy prediction," LSE Research Online Documents on Economics 118250, London School of Economics and Political Science, LSE Library.
    4. Zhu, Jin & Wan, Runzhe & Qi, Zhengling & Luo, Shikai & Shi, Chengchun, 2024. "Robust offline reinforcement learning with heavy-tailed rewards," LSE Research Online Documents on Economics 122740, London School of Economics and Political Science, LSE Library.
    5. Gao, Yuhe & Shi, Chengchun & Song, Rui, 2023. "Deep spectral Q-learning with application to mobile health," LSE Research Online Documents on Economics 119445, London School of Economics and Political Science, LSE Library.
    6. Asanov, Anastasiya-Mariya & Asanov, Igor & Buenstorf, Guido, 2024. "A low-cost digital first aid tool to reduce psychological distress in refugees: A multi-country randomized controlled trial of self-help online in the first months after the invasion of Ukraine," Social Science & Medicine, Elsevier, vol. 362(C).
    7. Justin Whitehouse & Morgane Austern & Vasilis Syrgkanis, 2025. "Inference on Optimal Policy Values and Other Irregular Functionals via Smoothing," Papers 2507.11780, arXiv.org.
    8. Yi Zhang & Kosuke Imai, 2023. "Individualized Policy Evaluation and Learning under Clustered Network Interference," Papers 2311.02467, arXiv.org, revised Mar 2025.
    9. Giovanni Cerulli & Francesco Caracciolo, 2025. "Risk-Adjusted Policy Learning and the Social Cost of Uncertainty: Theory and Evidence from CAP evaluation," Papers 2510.05007, arXiv.org.
    10. Manski, Charles F., 2023. "Probabilistic prediction for binary treatment choice: With focus on personalized medicine," Journal of Econometrics, Elsevier, vol. 234(2), pages 647-663.
    11. Yan Liu, 2022. "Policy Learning under Endogeneity Using Instrumental Variables," Papers 2206.09883, arXiv.org, revised Jan 2026.
    12. Combes, Pierre-Philippe & Gobillon, Laurent & Zylberberg, Yanos, 2022. "Urban economics in a historical perspective: Recovering data with machine learning," Regional Science and Urban Economics, Elsevier, vol. 94(C).
    13. Bokelmann, Björn & Lessmann, Stefan, 2024. "Improving uplift model evaluation on randomized controlled trial data," European Journal of Operational Research, Elsevier, vol. 313(2), pages 691-707.
    14. Garbero, Alessandra & Sakos, Grayson & Cerulli, Giovanni, 2023. "Towards data-driven project design: Providing optimal treatment rules for development projects," Socio-Economic Planning Sciences, Elsevier, vol. 89(C).
    15. Ta-Wei Huang & Eva Ascarza, 2024. "Doing More with Less: Overcoming Ineffective Long-Term Targeting Using Short-Term Signals," Marketing Science, INFORMS, vol. 43(4), pages 863-884, July.
    16. Undral Byambadalai, 2022. "Identification and Inference for Welfare Gains without Unconfoundedness," Papers 2207.04314, arXiv.org.
    17. Black, Dan A. & Grogger, Jeffrey & Kirchmaier, Tom & Sanders, Koen, 2023. "Criminal charges, risk assessment and violent recidivism in cases of domestic abuse," LSE Research Online Documents on Economics 121374, London School of Economics and Political Science, LSE Library.
    18. Michael Lechner, 2023. "Causal Machine Learning and its use for public policy," Swiss Journal of Economics and Statistics, Springer;Swiss Society of Economics and Statistics, vol. 159(1), pages 1-15, December.
    19. Yuchen Hu & Henry Zhu & Emma Brunskill & Stefan Wager, 2024. "Minimax-Regret Sample Selection in Randomized Experiments," Papers 2403.01386, arXiv.org, revised Jun 2024.
    20. Sarah Moon, 2025. "Optimal Policy Choices Under Uncertainty," Papers 2503.03910, arXiv.org, revised Aug 2025.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:arx:papers:2505.13809. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: arXiv administrators (email available below). General contact details of provider: http://arxiv.org/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.