INTERSPEECH 2006 - ICSLP
Collection of Taiwanese text corpus with phonetic transcription suffers from the problems of multiple pronunciation variation. By augmenting the text with speech, and using automatic speech recognition with a sausage searching net constructed from the multiple pronunciations of the text corresponding to its speech utterance, we are able to reduce the effort for phonetic transcription. By using the multiple pronunciation lexicon, the error rate of transcription 13.94% was achieved. Further improvement can be achieved by adapting the pronunciation lexicon with pronunciation variation (PV) rules derived from a manual corrected speech corpus. The PV rules can be categorized into two kinds: the knowledge-based and data-driven rules. By incorporating the PV rules, the error rate reduction 13.63% could be achieved. Although the technique was developed for Taiwanese speech, it could also be adapted easily to be applied in the other similar "minority" Chinese spoken languages.
Bibliographic reference. Liang, Min-Siong / Lyu, Ren-Yuan / Chiang, Yuang-Chin (2006): "Using speech recognition technique for constructing a phonetically transcribed taiwanese (min-nan) text corpus", In INTERSPEECH-2006, paper 1442-Mon1CaP.11.