EUROSPEECH 2001 Scandinavia
The paper addresses large vocabulary spontaneous speech recognition focusing on acoustic modeling that considers the speaking rate. Using the real lecture speech corpus collected under the priority research project in Japan, we have made baseline acoustic model, and evaluated on the automatic transcription of oral presentations by experienced speakers and obtained word accuracy of 58.2%. Compared with read speech, we have observed significant difference in the speaking rate. To handle fast and poorly articulated phone segments, several extensions of the modeling are explored. Specifically, we introduce state-skipping modeling, speech rate-dependent model, and syllable sub-word modeling. As a result, we reduced the word error rate by absolute 0.8%-2.0%. We also address a language modeling especially on effective use of various large text corpora.
Bibliographic reference. Nanjo, Hiroaki / Kato, Kazuomi / Kawahara, Tatsuya (2001): "Speaking rate dependent acoustic modeling for spontaneous lecture speech recognition", In EUROSPEECH-2001, 2531-2534.