IDEAS home Printed from https://ideas.repec.org/p/arx/papers/2502.14131.html
   My bibliography  Save this paper

Gradients can train reward models: An Empirical Risk Minimization Approach for Offline Inverse RL and Dynamic Discrete Choice Model

Author

Listed:
  • Enoch H. Kang
  • Hema Yoganarasimhan
  • Lalit Jain

Abstract

We study the problem of estimating Dynamic Discrete Choice (DDC) models, also known as offline Maximum Entropy-Regularized Inverse Reinforcement Learning (offline MaxEnt-IRL) in machine learning. The objective is to recover reward or $Q^*$ functions that govern agent behavior from offline behavior data. In this paper, we propose a globally convergent gradient-based method for solving these problems without the restrictive assumption of linearly parameterized rewards. The novelty of our approach lies in introducing the Empirical Risk Minimization (ERM) based IRL/DDC framework, which circumvents the need for explicit state transition probability estimation in the Bellman equation. Furthermore, our method is compatible with non-parametric estimation techniques such as neural networks. Therefore, the proposed method has the potential to be scaled to high-dimensional, infinite state spaces. A key theoretical insight underlying our approach is that the Bellman residual satisfies the Polyak-Lojasiewicz (PL) condition -- a property that, while weaker than strong convexity, is sufficient to ensure fast global convergence guarantees. Through a series of synthetic experiments, we demonstrate that our approach consistently outperforms benchmark methods and state-of-the-art alternatives.

Suggested Citation

  • Enoch H. Kang & Hema Yoganarasimhan & Lalit Jain, 2025. "Gradients can train reward models: An Empirical Risk Minimization Approach for Offline Inverse RL and Dynamic Discrete Choice Model," Papers 2502.14131, arXiv.org, revised Mar 2025.
  • Handle: RePEc:arx:papers:2502.14131
    as

    Download full text from publisher

    File URL: http://arxiv.org/pdf/2502.14131
    File Function: Latest version
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Rust, John, 1987. "Optimal Replacement of GMC Bus Engines: An Empirical Model of Harold Zurcher," Econometrica, Econometric Society, vol. 55(5), pages 999-1033, September.
    2. Andriy Norets, 2012. "Estimation of Dynamic Discrete Choice Models Using Artificial Neural Network Approximations," Econometric Reviews, Taylor & Francis Journals, vol. 31(1), pages 84-106.
    3. Victor Aguirregabiria & Pedro Mira, 2002. "Swapping the Nested Fixed Point Algorithm: A Class of Estimators for Discrete Markov Decision Models," Econometrica, Econometric Society, vol. 70(4), pages 1519-1543, July.
    4. Thierry Magnac & David Thesmar, 2002. "Identifying Dynamic Discrete Decision Processes," Econometrica, Econometric Society, vol. 70(2), pages 801-816, March.
    5. Che‐Lin Su & Kenneth L. Judd, 2012. "Constrained Optimization Approaches to Estimation of Structural Models," Econometrica, Econometric Society, vol. 80(5), pages 2213-2230, September.
    6. V. Joseph Hotz & Robert A. Miller, 1993. "Conditional Choice Probabilities and the Estimation of Dynamic Models," The Review of Economic Studies, Review of Economic Studies Ltd, vol. 60(3), pages 497-529.
    7. Victor Chernozhukov & Juan Carlos Escanciano & Hidehiko Ichimura & Whitney K. Newey & James M. Robins, 2022. "Locally Robust Semiparametric Estimation," Econometrica, Econometric Society, vol. 90(4), pages 1501-1535, July.
    8. Hugo Benitez-Silva & John Rust & Gunter Hitsch & Giorgio Pauletto & George Hall, 2000. "A Comparison Of Discrete And Parametric Methods For Continuous-State Dynamic Programming Problems," Computing in Economics and Finance 2000 24, Society for Computational Economics.
    9. Victor Aguirregabiria & Pedro Mira, 2007. "Sequential Estimation of Dynamic Discrete Games," Econometrica, Econometric Society, vol. 75(1), pages 1-53, January.
    10. Peter Arcidiacono & Robert A. Miller, 2011. "Conditional Choice Probability Estimation of Dynamic Discrete Choice Models With Unobserved Heterogeneity," Econometrica, Econometric Society, vol. 79(6), pages 1823-1867, November.
    11. Hiroyuki Kasahara & Katsumi Shimotsu, 2009. "Nonparametric Identification of Finite Mixture Models of Dynamic Discrete Choices," Econometrica, Econometric Society, vol. 77(1), pages 135-175, January.
    12. Khai Xiang Chiong & Alfred Galichon & Matt Shum, 2016. "Duality in dynamic discrete‐choice models," Quantitative Economics, Econometric Society, vol. 7(1), pages 83-115, March.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Kalouptsidi, Myrto & Scott, Paul T. & Souza-Rodrigues, Eduardo, 2021. "Linear IV regression estimators for structural dynamic discrete choice models," Journal of Econometrics, Elsevier, vol. 222(1), pages 778-804.
    2. Hu Yingyao & Shum Matthew & Tan Wei & Xiao Ruli, 2017. "A Simple Estimator for Dynamic Models with Serially Correlated Unobservables," Journal of Econometric Methods, De Gruyter, vol. 6(1), pages 1-16, January.
    3. Hanming Fang & Yang Wang, 2015. "Estimating Dynamic Discrete Choice Models With Hyperbolic Discounting, With An Application To Mammography Decisions," International Economic Review, Department of Economics, University of Pennsylvania and Osaka University Institute of Social and Economic Research Association, vol. 56(2), pages 565-596, May.
    4. Arcidiacono, Peter & Miller, Robert A., 2020. "Identifying dynamic discrete choice models off short panels," Journal of Econometrics, Elsevier, vol. 215(2), pages 473-485.
    5. Aguirregabiria, Victor & Mira, Pedro, 2010. "Dynamic discrete choice structural models: A survey," Journal of Econometrics, Elsevier, vol. 156(1), pages 38-67, May.
    6. Sasaki, Yuya & Takahashi, Yuya & Xin, Yi & Hu, Yingyao, 2023. "Dynamic discrete choice models with incomplete data: Sharp identification," Journal of Econometrics, Elsevier, vol. 236(1).
    7. Myrto Kalouptsidi & Paul T. Scott & Eduardo Souza-Rodrigues, 2018. "Linear IV Regression Estimators for Structural Dynamic Discrete Choice Models," NBER Working Papers 25134, National Bureau of Economic Research, Inc.
    8. repec:spo:wpmain:info:hdl:2441/7svo6civd6959qvmn4965cth1d is not listed on IDEAS
    9. Khai Chiong & Alfred Galichon & Matt Shum, 2015. "Duality in Dynamic Discrete Choice Models," SciencePo Working papers Main hal-03568184, HAL.
    10. Hu, Yingyao & Xin, Yi, 2024. "Identification and estimation of dynamic structural models with unobserved choices," Journal of Econometrics, Elsevier, vol. 242(2).
    11. Khai Xiang Chiong & Alfred Galichon & Matt Shum, 2021. "Duality in dynamic discrete-choice models," Papers 2102.06076, arXiv.org, revised Feb 2021.
    12. Khai Chiong & Alfred Galichon & Matt Shum, 2015. "Duality in Dynamic Discrete Choice Models," SciencePo Working papers hal-03568184, HAL.
    13. Sebastian Galiani & Juan Pantano, 2021. "Structural Models: Inception and Frontier," NBER Working Papers 28698, National Bureau of Economic Research, Inc.
    14. Kalouptsidi, Myrto & Scott, Paul T. & Souza-Rodrigues, Eduardo, 2018. "Linear IV Regression Estimators for Structural Dynamic Discrete Choice Models," CEPR Discussion Papers 13240, C.E.P.R. Discussion Papers.
    15. Khai Chiong & Alfred Galichon & Matt Shum, 2015. "Duality in Dynamic Discrete Choice Models," Post-Print hal-03568184, HAL.
    16. repec:hal:spmain:info:hdl:2441/7svo6civd6959qvmn4965cth1d is not listed on IDEAS
    17. Karun Adusumilli & Dita Eckardt, 2019. "Temporal-Difference estimation of dynamic discrete choice models," Papers 1912.09509, arXiv.org, revised Dec 2022.
    18. Hiroyuki Kasahara & Katsumi Shimotsu, 2012. "Sequential Estimation of Structural Models With a Fixed Point Constraint," Econometrica, Econometric Society, vol. 80(5), pages 2303-2319, September.
    19. Myrto Kalouptsidi & Paul T. Scott & Eduardo Souza‐Rodrigues, 2021. "Identification of counterfactuals in dynamic discrete choice models," Quantitative Economics, Econometric Society, vol. 12(2), pages 351-403, May.
    20. Komarova, Tatiana & Sanches, Fábio Adriano & Silva Junior, Daniel & Srisuma, Sorawoot, 2018. "Joint analysis of the discount factor and payoff parameters in dynamic discrete choice games," LSE Research Online Documents on Economics 86858, London School of Economics and Political Science, LSE Library.
    21. Manuel Arellano & Stéphane Bonhomme, 2017. "Nonlinear Panel Data Methods for Dynamic Heterogeneous Agent Models," Annual Review of Economics, Annual Reviews, vol. 9(1), pages 471-496, September.
    22. Victor Aguirregabiria & Victor Aguirregabiria & Aviv Nevo & Aviv Nevo, 2010. "Recent Developments in Empirical IO: Dynamic Demand and Dynamic Games," Working Papers tecipa-419, University of Toronto, Department of Economics.

    More about this item

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:arx:papers:2502.14131. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: arXiv administrators (email available below). General contact details of provider: http://arxiv.org/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.