Tech Report CS-96-10

A Generalized Reinforcement-Learning Model: Convergence and Applications

Michael L. Littman and Csaba Szepesva\'ri

February 1996

Abstract:

Reinforcement learning is the process by which an autonomous agent uses its experience interacting with an environment to improve its behavior. The Markov decision process (MDP) model is a popular way of formalizing the reinforcement-learning problem, but it is by no means the only way. In this paper, we show how many of the important theoretical results concerning reinforcement learning in MDPs extend to a generalized MDP model that includes MDPs, two-player games and MDPs under a worst-case optimality criterion as special cases. The basis of this extension is a stochastic-approximation theorem that reduces asynchronous convergence to synchronous convergence.

Keywords: Reinforcement learning, Q-learning convergence, Markov games

(complete text in pdf or gzipped postscript)