13th Annual Conference of the International Speech Communication Association

Portland, OR, USA
September 9-13, 2012

Using Reinforcement Learning for Dialogue Management Policies: Towards Understanding MDP Violations and Convergence

Peter A. Heeman, Jordan Fryer, Rebecca Lunsford, Andrew Rueckert, Ethan Selfridge

Center for Spoken Language Understanding, Oregon Health & Science University, Beaverton, OR, USA

Reinforcement learning is becoming a popular tool for building dialogue managers. This paper addresses two issues in using RL. First, we propose two methods for finding MDP violations. Both methods make use of computing Q scores when testing the policy. Second, we investigate how convergence happens. To do this, we use a dialogue task in which the only source of variability is the dialogue policy itself. This allows us to study how and when convergence happens as training progresses. The work in this paper should help dialogue designers build effective policies and understand how much training is necessary.

Full Paper

Bibliographic reference.  Heeman, Peter A. / Fryer, Jordan / Lunsford, Rebecca / Rueckert, Andrew / Selfridge, Ethan (2012): "Using reinforcement learning for dialogue management Policies: towards understanding MDP violations and convergence", In INTERSPEECH-2012, 747-750.