Ninth International Conference on Spoken Language Processing

Pittsburgh, PA, USA
September 17-21, 2006

Automatic Syllable-Pattern Induction in Statistical Thai Text-to-Phone Transcription

Ausdang Thangthai, Chatchawarn Hansakunbuntheung, Rungkarn Siricharoenchai, Chai Wutiwiwatchai

NECTEC, Thailand

This paper proposes a technique of automatic syllable-pattern induction in statistical Thai text-to-phone transcription. A general process of building a statistical text-to-phone transcription is to first define a set of rules describing syllable patterns, which is used for syllabification. Given an input text, the syllabification process generates all possible syllable sequences, which are then scored and selected using a statistical model. Updating the handcrafted rule set of syllable patterns is time-consuming and requires expert linguists. Instead of the manual process, automatic induction of new syllable patterns from a large raw text if proposed. The process that can deal with raw text is particularly needed for Thai as segmenting Thai text is a very tedious task. Experiments show that the proposed Thai text-to-phone transcription system after applying a large raw text for syllable-pattern induction achieves approximately 2% improvement. A comparison with other Thai text-to-phone transcription models and error analyses are also given in the paper.

