This paper presents the development and evaluation of an HMMbased TTS system for the modern Hanoi dialect of Northern Vietnamese, a tonal language. A study of specific phonetic and prosodic features of Hanoi Vietnamese is discussed. Consequences on the design of an HMM-based TTS system are derived. Using this knowledge, a TTS system, called VTed, is then developed under the Mary TTS platform. The second part of the paper is devoted to perceptual evaluations of Vietnamese speech synthesis. Three kinds of evaluations are considered necessary for quality assessment of this tonal language. The general MOS assessment, utterance-level intelligibility, and tone-level intelligibility tests are conducted on the VTed system under a "natural speech reference" condition. The results show 1.21 points difference between natural and synthetic speech for the MOS test, a 0.2%.0.9% difference for the utterance-level intelligibility test, 23% on average and . depending on the tone type . from 0% to 37% difference for the tone-level intelligibility test. These results demonstrate the need for more specific works on tonal/prosodic level to improve automatic synthesis of Vietnamese and other tonal languages.
Bibliographic reference. Nguyen, Thi Thu Trang / D'Alessandro, Christophe / Rilliard, Albert / Tran, Do Dat (2013): "HMM-based TTS for hanoi vietnamese: issues in design and evaluation", In INTERSPEECH-2013, 2311-2315.