EUROSPEECH 2001 Scandinavia
7th European Conference on Speech Communication and Technology

Aalborg, Denmark
September 3-7, 2001


Towards Automatic Transcription of Spontaneous Presentations

Takahiro Shinozaki, Chiori Hori, Sadaoki Furui

Tokyo Institute of Technology, Japan

This paper reports various investigations on recognizing spontaneous presentation speech in connection with the "Spontaneous Speech" national project started in 1999. Presentation speech uttered by 10 male speakers of approximately 4.5 hours duration has been recognized. Experimental results show that acoustic and language modeling based on an actual spontaneous speech corpus is far more effective than conventional modeling based on read speech. The recognition accuracy has a wide speaker-to-speaker variability according to the speaking rate, the number of fillers, the number of repairs, etc. It was confirmed that unsupervised speaker adaptation of acoustic models was effective to improve the recognition accuracy. The recognition accuracy for spontaneous speech is, however, still rather low, and there remains a large number of research issues.

Full Paper

Bibliographic reference.  Shinozaki, Takahiro / Hori, Chiori / Furui, Sadaoki (2001): "Towards automatic transcription of spontaneous presentations", In EUROSPEECH-2001, 491-494.