Ninth International Conference on Spoken Language Processing

Pittsburgh, PA, USA
September 17-21, 2006

Constrained Structural Maximum a posteriori Linear Regression for Average-Voice-Based Speech Synthesis

Yuji Nakano, Makoto Tachibana, Junichi Yamagishi, Takao Kobayashi

Tokyo Institute of Technology, Japan

This paper proposes a constrained structural maximum a posteriori linear regression (CSMAPLR) algorithm for further improvement of speaker adaptation performance in HMM-based speech synthesis. In the algorithm, the concept of structural maximum a posteriori (SMAP) adaptation is applied to estimation of transformation matrices of the constrained MLLR (CMLLR), where recursive MAP-based estimation of the transformation matrices from the root node to lower nodes of context decision tree is conducted. We incorporate the algorithm into HSMM-based speech synthesis system and show that CSMAPLR adaptation utilizes both of the advantage of CMLLR and SMAPLR adaptation from the result of objective evaluation test. We also show that CSMAPLR adaptation provides more similar synthetic speech to the target speaker than CMLLR and SMAPLR adaptation from the result of subjective evaluation test.

Full Paper

Bibliographic reference.  Nakano, Yuji / Tachibana, Makoto / Yamagishi, Junichi / Kobayashi, Takao (2006): "Constrained structural maximum a posteriori linear regression for average-voice-based speech synthesis", In INTERSPEECH-2006, paper 1784-Thu1BuP.10.