Tech Report CS-96-10
A Generalized Reinforcement-Learning Model: Convergence and Applications
Michael L. Littman and Csaba Szepesva\'ri
February 1996
Abstract:
Reinforcement learning is the process by which an autonomous agent uses its experience interacting with an environment to improve its behavior. The Markov decision process (MDP) model is a popular way of formalizing the reinforcement-learning problem, but it is by no means the only way. In this paper, we show how many of the important theoretical results concerning reinforcement learning in MDPs extend to a generalized MDP model that includes MDPs, two-player games and MDPs under a worst-case optimality criterion as special cases. The basis of this extension is a stochastic-approximation theorem that reduces asynchronous convergence to synchronous convergence.
Keywords: Reinforcement learning, Q-learning convergence, Markov games
(complete text in pdf or gzipped postscript)