Improving Generalisation to New Speakers in Spoken Dialogue State Tracking

Iñigo Casanueva, Thomas Hain, Phil Green

Users with disabilities can greatly benefit from personalised voice-enabled environmental-control interfaces, but for users with speech impairments (e.g. dysarthria) poor ASR performance poses a challenge to successful dialogue. Statistical dialogue management has shown resilience against high ASR error rates, hence making it useful to improve the performance of these interfaces. However, little research was devoted to dialogue management personalisation to specific users so far. Recently, data driven discriminative models have been shown to yield the best performance in dialogue state tracking (the inference of the user goal from the dialogue history). However, due to the unique characteristics of each speaker, training a system for a new user when user specific data is not available can be challenging due to the mismatch between training and working conditions. This work investigates two methods to improve the performance with new speakers of a LSTM-based personalised state tracker: The use of speaker specific acoustic and ASR-related features; and dropout regularisation. It is shown that in an environmental control system for dysarthric speakers, the combination of both techniques yields improvements of 3.5% absolute in state tracking accuracy. Further analysis explores the effect of using different amounts of speaker specific data to train the tracking system.

DOI: 10.21437/Interspeech.2016-404

Cite as

Casanueva, I., Hain, T., Green, P. (2016) Improving Generalisation to New Speakers in Spoken Dialogue State Tracking. Proc. Interspeech 2016, 2726-2730.

author={Iñigo Casanueva and Thomas Hain and Phil Green},
title={Improving Generalisation to New Speakers in Spoken Dialogue State Tracking},
booktitle={Interspeech 2016},