IDEAS home Printed from https://ideas.repec.org/p/arx/papers/2305.17083.html
   My bibliography  Save this paper

A Policy Gradient Method for Confounded POMDPs

Author

Listed:
  • Mao Hong
  • Zhengling Qi
  • Yanxun Xu

Abstract

In this paper, we propose a policy gradient method for confounded partially observable Markov decision processes (POMDPs) with continuous state and observation spaces in the offline setting. We first establish a novel identification result to non-parametrically estimate any history-dependent policy gradient under POMDPs using the offline data. The identification enables us to solve a sequence of conditional moment restrictions and adopt the min-max learning procedure with general function approximation for estimating the policy gradient. We then provide a finite-sample non-asymptotic bound for estimating the gradient uniformly over a pre-specified policy class in terms of the sample size, length of horizon, concentratability coefficient and the measure of ill-posedness in solving the conditional moment restrictions. Lastly, by deploying the proposed gradient estimation in the gradient ascent algorithm, we show the global convergence of the proposed algorithm in finding the history-dependent optimal policy under some technical conditions. To the best of our knowledge, this is the first work studying the policy gradient method for POMDPs under the offline setting.

Suggested Citation

  • Mao Hong & Zhengling Qi & Yanxun Xu, 2023. "A Policy Gradient Method for Confounded POMDPs," Papers 2305.17083, arXiv.org, revised Nov 2023.
  • Handle: RePEc:arx:papers:2305.17083
    as

    Download full text from publisher

    File URL: http://arxiv.org/pdf/2305.17083
    File Function: Latest version
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Raissa Zurli Bittencourt Bravo & Adriana Leiras & Fernando Luiz Cyrino Oliveira, 2019. "The Use of UAVs in Humanitarian Relief: An Application of POMDP‐Based Methodology for Finding Victims," Production and Operations Management, Production and Operations Management Society, vol. 28(2), pages 421-440, February.
    2. Whitney K. Newey & James L. Powell, 2003. "Instrumental Variable Estimation of Nonparametric Models," Econometrica, Econometric Society, vol. 71(5), pages 1565-1578, September.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Ai, Chunrong & Chen, Xiaohong, 2007. "Estimation of possibly misspecified semiparametric conditional moment restriction models with different conditioning variables," Journal of Econometrics, Elsevier, vol. 141(1), pages 5-43, November.
    2. Richard Blundell & Joel Horowitz & Matthias Parey, 2022. "Estimation of a Heterogeneous Demand Function with Berkson Errors," The Review of Economics and Statistics, MIT Press, vol. 104(5), pages 877-889, December.
    3. Arellano, Manuel & Blundell, Richard & Bonhomme, Stéphane & Light, Jack, 2024. "Heterogeneity of consumption responses to income shocks in the presence of nonlinear persistence," Journal of Econometrics, Elsevier, vol. 240(2).
    4. Shiu, Ji-Liang & Hu, Yingyao, 2013. "Identification and estimation of nonlinear dynamic panel data models with unobserved covariates," Journal of Econometrics, Elsevier, vol. 175(2), pages 116-131.
    5. Nir Billfeld & Moshe Kim, 2024. "Context-dependent Causality (the Non-Nonotonic Case)," Papers 2404.05021, arXiv.org.
    6. Chen, Xiaohong & Pouzo, Demian, 2009. "Efficient estimation of semiparametric conditional moment models with possibly nonsmooth residuals," Journal of Econometrics, Elsevier, vol. 152(1), pages 46-60, September.
    7. Dunker, Fabian & Hoderlein, Stefan & Kaido, Hiroaki, 2014. "Nonparametric Identification of Endogenous and Heterogeneous Aggregate Demand Models: Complements, Bundles and the Market Level," Economics Series 307, Institute for Advanced Studies.
    8. Chaohua Dong & Jiti Gao, 2014. "Specification Testing in Structural Nonparametric Cointegration," Monash Econometrics and Business Statistics Working Papers 2/14, Monash University, Department of Econometrics and Business Statistics.
    9. Breunig, Christoph & Mammen, Enno & Simoni, Anna, 2018. "Nonparametric estimation in case of endogenous selection," Journal of Econometrics, Elsevier, vol. 202(2), pages 268-285.
    10. Jarociński, Marek & Marcet, Albert, 2019. "Priors about observables in vector autoregressions," Journal of Econometrics, Elsevier, vol. 209(2), pages 238-255.
    11. Roy Allen & John Rehbeck, 2020. "Identification of Random Coefficient Latent Utility Models," Papers 2003.00276, arXiv.org.
    12. Gayle, Wayne-Roy & Namoro, Soiliou Daw, 2013. "Estimation of a nonlinear panel data model with semiparametric individual effects," Journal of Econometrics, Elsevier, vol. 175(1), pages 46-59.
    13. Babii, Andrii, 2020. "Honest Confidence Sets In Nonparametric Iv Regression And Other Ill-Posed Models," Econometric Theory, Cambridge University Press, vol. 36(4), pages 658-706, August.
    14. Rodrigo Adão & Costas Arkolakis & Sharat Ganapati, 2020. "Aggregate Implications of Firm Heterogeneity: A Nonparametric Analysis of Monopolistic Competition Trade Models," Working Papers 2020-161, Becker Friedman Institute for Research In Economics.
    15. Samuele Centorrino & Jean-Pierre Florens & Jean-Michel Loubes, 2022. "Fairness constraint in Structural Econometrics and Application to fair estimation using Instrumental Variables," Papers 2202.08977, arXiv.org.
    16. Halbert White & Karim Chalak, 2013. "Identification and Identification Failure for Treatment Effects Using Structural Systems," Econometric Reviews, Taylor & Francis Journals, vol. 32(3), pages 273-317, November.
    17. Ben Deaner, 2018. "Proxy Controls and Panel Data," Papers 1810.00283, arXiv.org, revised Nov 2023.
    18. Krief, Jerome M., 2017. "Direct instrumental nonparametric estimation of inverse regression functions," Journal of Econometrics, Elsevier, vol. 201(1), pages 95-107.
    19. Maican, Florin G., 2012. "From Boom to Bust and Back Again: A dynamic analysis of IT services," Working Papers in Economics 543, University of Gothenburg, Department of Economics.
    20. Peter C.B. Phillips & Liangjun Su, 2009. "Nonparametric Structural Estimation via Continuous Location Shifts in an Endogenous Regressor," Cowles Foundation Discussion Papers 1702, Cowles Foundation for Research in Economics, Yale University.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:arx:papers:2305.17083. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: arXiv administrators (email available below). General contact details of provider: http://arxiv.org/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.