Bayesian Learning of Noisy Markov Decision Processes
This work addresses the problem of estimating the optimal value function in a MarkovDecision Process from observed state-action pairs. We adopt a Bayesian approach toinference, which allows both the model to be estimated and predictions about actions tobe made in a unified framework, providing a principled approach to mimicry of a controlleron the basis of observed data. A new Markov chain Monte Carlo (MCMC) sampler isdevised for simulation from the posterior distribution over the optimal value function.This step includes a parameter expansion step, which is shown to be essential for goodconvergence properties of the MCMC sampler. As an illustration, the method is appliedto learning a human controller.
|Date of creation:||2010|
|Date of revision:|
|Contact details of provider:|| Postal: 15 Boulevard Gabriel Peri 92245 Malakoff Cedex|
Phone: 01 41 17 60 81
Web page: http://www.crest.fr
More information through EDIRC
Please report citation or reference errors to , or , if you are the registered author of the cited work, log in to your RePEc Author Service profile, click on "citations" and make appropriate adjustments.:
- Susumu Imai & Neelam Jain, 2005.
"Bayesian Estimation of Dynamic Discrete Choice Models,"
2005 Meeting Papers
432, Society for Economic Dynamics.
- Susumu Imai & Neelam Jain & Andrew Ching, 2009. "Bayesian Estimation of Dynamic Discrete Choice Models," Econometrica, Econometric Society, vol. 77(6), pages 1865-1899, November.
- Susumu Imai & Neelam Jain & Andrew Ching, 2006. "Bayesian Estimation of Dynamic Discrete Choice Models," Working Papers 1118, Queen's University, Department of Economics.
- Imai, Kosuke & van Dyk, David A., 2005. "A Bayesian analysis of the multinomial probit model using marginal data augmentation," Journal of Econometrics, Elsevier, vol. 124(2), pages 311-334, February.
- Rust, John, 1987. "Optimal Replacement of GMC Bus Engines: An Empirical Model of Harold Zurcher," Econometrica, Econometric Society, vol. 55(5), pages 999-1033, September.
- McCulloch, Robert E. & Polson, Nicholas G. & Rossi, Peter E., 2000. "A Bayesian analysis of the multinomial probit model with fully identified parameters," Journal of Econometrics, Elsevier, vol. 99(1), pages 173-193, November.
- V. Joseph Hotz & Robert A. Miller, 1993. "Conditional Choice Probabilities and the Estimation of Dynamic Models," Review of Economic Studies, Oxford University Press, vol. 60(3), pages 497-529.
- McCulloch, Robert & Rossi, Peter E., 1994. "An exact likelihood analysis of the multinomial probit model," Journal of Econometrics, Elsevier, vol. 64(1-2), pages 207-240.
- John F. Geweke & Michael P. Keane & David E. Runkle, 1994.
"Alternative computational approaches to inference in the multinomial probit model,"
170, Federal Reserve Bank of Minneapolis.
- Geweke, John & Keane, Michael P & Runkle, David, 1994. "Alternative Computational Approaches to Inference in the Multinomial Probit Model," The Review of Economics and Statistics, MIT Press, vol. 76(4), pages 609-32, November.
- Gotz, Glenn A. & McCall, John J., 1980. "Estimation in sequential decisionmaking models : A methodological note," Economics Letters, Elsevier, vol. 6(2), pages 131-136.
- Wolpin, Kenneth I, 1984. "An Estimable Dynamic Stochastic Model of Fertility and Child Mortality," Journal of Political Economy, University of Chicago Press, vol. 92(5), pages 852-74, October.
When requesting a correction, please mention this item's handle: RePEc:crs:wpaper:2010-36. See general information about how to correct material in RePEc.
For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: (Florian Sallaberry)
If references are entirely missing, you can add them using this form.