12th Annual Conference of the International Speech Communication Association

Florence, Italy
August 27-31. 2011

Morpheme Conversion for Connecting Speech Recognizer and Language Analyzers in Unsegmented Languages

Kenji Imamura, Tomoko Izumi, Kugatsu Sadamitsu, Kuniko Saito, Satoshi Kobashikawa, Hirokazu Masataki

NTT Corporation, Japan

Connecting automatic speech recognizers (ASRs) and language analyzers is difficult since they may be based on differences in part-of-speech (POS) systems; the latter cannot directly analyze the outputs of the former. In addition, in unsegmented languages such as Japanese, the ASR outputs are likely to have different word segmentation from that of the language analyzer inputs because they are individually developed.

A conventional approach is to generate raw texts from the ASR outputs and re-analyze them using a morphological analyzer. However, if the ASR outputs contain recognition errors, the morphological analyzer incorrectly analyzes them even though they contain correctly recognized words.

To avoid this problem, we propose a morpheme conversion method that directly converts ASR outputs into morpheme sequences suitable for the language analyzers. Our experiments show that morpheme conversion is more robust than the conventional approach against recognition errors.

Full Paper

Bibliographic reference.  Imamura, Kenji / Izumi, Tomoko / Sadamitsu, Kugatsu / Saito, Kuniko / Kobashikawa, Satoshi / Masataki, Hirokazu (2011): "Morpheme conversion for connecting speech recognizer and language analyzers in unsegmented languages", In INTERSPEECH-2011, 1405-1408.