4th International Conference on Spoken Language Processing
Philadelphia, PA, USA
Real-world applications using speech recognition must perform well over a range of dialects. Differences in dialect between the speakers in the training database and the target users often leads to degraded recognition performance. For the BBN Hark Hidden Markov Model (HMM) based system, we have already developed a reasonably effective technique  for dealing with multiple US dialects. The solution involves building separate HMM sets for each dialect from representative training speech data. This requires that training speakers be accurately classified by dialect, which is difficult to do reliably even by hand. In this paper we describe a recognition based pseudo-automatic scheme for partitioning a pool of US English training speakers into groups, such that the speakers within each group share the same pronunciation characteristics. Our scheme is speech-data driven, and involves using transcript-level word hypotheses generated by a recognizer to partition the pool of training speakers.
Bibliographic reference. Huggins, A. W. F. / Patel, Yogen (1996): "The use of shibboleth words for automatically classifying speakers by dialect", In ICSLP-1996, 2017-2020.