IDEAS home Printed from https://ideas.repec.org/p/arx/papers/2305.17083.html
   My bibliography  Save this paper

A Policy Gradient Method for Confounded POMDPs

Author

Listed:
  • Mao Hong
  • Zhengling Qi
  • Yanxun Xu

Abstract

In this paper, we propose a policy gradient method for confounded partially observable Markov decision processes (POMDPs) with continuous state and observation spaces in the offline setting. We first establish a novel identification result to non-parametrically estimate any history-dependent policy gradient under POMDPs using the offline data. The identification enables us to solve a sequence of conditional moment restrictions and adopt the min-max learning procedure with general function approximation for estimating the policy gradient. We then provide a finite-sample non-asymptotic bound for estimating the gradient uniformly over a pre-specified policy class in terms of the sample size, length of horizon, concentratability coefficient and the measure of ill-posedness in solving the conditional moment restrictions. Lastly, by deploying the proposed gradient estimation in the gradient ascent algorithm, we show the global convergence of the proposed algorithm in finding the history-dependent optimal policy under some technical conditions. To the best of our knowledge, this is the first work studying the policy gradient method for POMDPs under the offline setting.

Suggested Citation

  • Mao Hong & Zhengling Qi & Yanxun Xu, 2023. "A Policy Gradient Method for Confounded POMDPs," Papers 2305.17083, arXiv.org, revised Nov 2023.
  • Handle: RePEc:arx:papers:2305.17083
    as

    Download full text from publisher

    File URL: http://arxiv.org/pdf/2305.17083
    File Function: Latest version
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Raissa Zurli Bittencourt Bravo & Adriana Leiras & Fernando Luiz Cyrino Oliveira, 2019. "The Use of UAVs in Humanitarian Relief: An Application of POMDP‐Based Methodology for Finding Victims," Production and Operations Management, Production and Operations Management Society, vol. 28(2), pages 421-440, February.
    2. Whitney K. Newey & James L. Powell, 2003. "Instrumental Variable Estimation of Nonparametric Models," Econometrica, Econometric Society, vol. 71(5), pages 1565-1578, September.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Ai, Chunrong & Chen, Xiaohong, 2007. "Estimation of possibly misspecified semiparametric conditional moment restriction models with different conditioning variables," Journal of Econometrics, Elsevier, vol. 141(1), pages 5-43, November.
    2. Xiaohong Chen & Andres Santos, 2018. "Overidentification in Regular Models," Econometrica, Econometric Society, vol. 86(5), pages 1771-1817, September.
    3. Richard Blundell & Joel Horowitz & Matthias Parey, 2022. "Estimation of a Heterogeneous Demand Function with Berkson Errors," The Review of Economics and Statistics, MIT Press, vol. 104(5), pages 877-889, December.
    4. Xiaohong Chen & Demian Pouzo, 2012. "Estimation of Nonparametric Conditional Moment Models With Possibly Nonsmooth Generalized Residuals," Econometrica, Econometric Society, vol. 80(1), pages 277-321, January.
    5. Arthur Lewbel, 2012. "Using Heteroscedasticity to Identify and Estimate Mismeasured and Endogenous Regressor Models," Journal of Business & Economic Statistics, Taylor & Francis Journals, vol. 30(1), pages 67-80.
    6. Xiaohong Chen & Victor Chernozhukov & Sokbae Lee & Whitney K. Newey, 2014. "Local Identification of Nonparametric and Semiparametric Models," Econometrica, Econometric Society, vol. 82(2), pages 785-809, March.
    7. Shoya Ishimaru, 2024. "Empirical Decomposition of the IV-OLS Gap with Heterogeneous and Nonlinear Effects," The Review of Economics and Statistics, MIT Press, vol. 106(2), pages 505-520, March.
    8. Arellano, Manuel & Blundell, Richard & Bonhomme, Stéphane & Light, Jack, 2024. "Heterogeneity of consumption responses to income shocks in the presence of nonlinear persistence," Journal of Econometrics, Elsevier, vol. 240(2).
    9. Doraszelski, Ulrich & Jaumandreu, Jordi, 2006. "R&D and productivity: Estimating production functions when productivity is endogenous," MPRA Paper 1246, University Library of Munich, Germany.
    10. Shiu, Ji-Liang & Hu, Yingyao, 2013. "Identification and estimation of nonlinear dynamic panel data models with unobserved covariates," Journal of Econometrics, Elsevier, vol. 175(2), pages 116-131.
    11. Chiappori, Pierre-Andre & Komunjer, Ivana, 2008. "Correct Specification and Identification of Nonparametric Transformation Models," University of California at San Diego, Economics Working Paper Series qt4v12m2rg, Department of Economics, UC San Diego.
    12. Nir Billfeld & Moshe Kim, 2024. "Context-dependent Causality (the Non-Nonotonic Case)," Papers 2404.05021, arXiv.org.
    13. Kim Kyoo il & Petrin Amil, 2022. "A Generalized Non-Parametric Instrumental Variable-Control Function Approach to Estimation in Nonlinear Settings," Journal of Econometric Methods, De Gruyter, vol. 11(1), pages 91-125, January.
    14. Chen, Xiaohong & Pouzo, Demian, 2009. "Efficient estimation of semiparametric conditional moment models with possibly nonsmooth residuals," Journal of Econometrics, Elsevier, vol. 152(1), pages 46-60, September.
    15. Dunker, Fabian & Hoderlein, Stefan & Kaido, Hiroaki, 2014. "Nonparametric Identification of Endogenous and Heterogeneous Aggregate Demand Models: Complements, Bundles and the Market Level," Economics Series 307, Institute for Advanced Studies.
    16. Chaohua Dong & Jiti Gao, 2014. "Specification Testing in Structural Nonparametric Cointegration," Monash Econometrics and Business Statistics Working Papers 2/14, Monash University, Department of Econometrics and Business Statistics.
    17. Xiaohong Chen & Timothy M. Christensen, 2015. "Optimal sup-norm rates, adaptivity and inference in nonparametric instrumental variables estimation," CeMMAP working papers 32/15, Institute for Fiscal Studies.
    18. Jing Nie & Juliana Malagon & Julian Williams, 2022. "The impact of high speed quoting on execution risk dynamics: Evidence from interest rate futures markets," Journal of Futures Markets, John Wiley & Sons, Ltd., vol. 42(8), pages 1434-1465, August.
    19. Breunig, Christoph & Mammen, Enno & Simoni, Anna, 2018. "Nonparametric estimation in case of endogenous selection," Journal of Econometrics, Elsevier, vol. 202(2), pages 268-285.
    20. Steven T. Berry & Philip A. Haile, 2021. "Foundations of Demand Estimation," Cowles Foundation Discussion Papers 2301, Cowles Foundation for Research in Economics, Yale University.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:arx:papers:2305.17083. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: arXiv administrators (email available below). General contact details of provider: http://arxiv.org/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.