A Sequence-to-Sequence Model for User Simulation in Spoken Dialogue Systems

Layla El Asri, Jing He, Kaheer Suleman

User simulation is essential for generating enough data to train a statistical spoken dialogue system. Previous models for user simulation suffer from several drawbacks, such as the inability to take dialogue history into account, the need of rigid structure to ensure coherent user behaviour, heavy dependence on a specific domain, the inability to output several user intentions during one dialogue turn, or the requirement of a summarized action space for tractability. This paper introduces a data-driven user simulator based on an encoder-decoder recurrent neural network. The model takes as input a sequence of dialogue contexts and outputs a sequence of dialogue acts corresponding to user intentions. The dialogue contexts include information about the machine acts and the status of the user goal. We show on the Dialogue State Tracking Challenge 2 (DSTC2) dataset that the sequence-to-sequence model outperforms an agenda-based simulator and an n-gram simulator, according to F-score. Furthermore, we show how this model can be used on the original action space and thereby models user behaviour with finer granularity.

DOI: 10.21437/Interspeech.2016-1175

Cite as

Asri, L.E., He, J., Suleman, K. (2016) A Sequence-to-Sequence Model for User Simulation in Spoken Dialogue Systems. Proc. Interspeech 2016, 1151-1155.

author={Layla El Asri and Jing He and Kaheer Suleman},
title={A Sequence-to-Sequence Model for User Simulation in Spoken Dialogue Systems},
booktitle={Interspeech 2016},