13th Annual Conference of the International Speech Communication Association

Portland, OR, USA
September 9-13, 2012

Histogram-based Spectral Equalization for HMM-based Speech Synthesis Using Mel-LSP

Yamato Ohtani, Masatsune Tamura, Masahiro Morita, Takehiko Kagoshima, Masami Akamine

Knowledge Media Laboratory, Corporate Research & Development Center, Toshiba Corporation, Japan

We propose a statistical spectral parameter emphasis technique for HMM-based speech synthesis. In the proposed method, the cumulative distribution function (CDF) is calculated from the histogram of spectral parameters which are extracted from training speech data. In the same manner, CDF of spectral parameters which are generated from HMMs is constructed. Then the emphasis rule is trained by relating CDF of training data to that of generated parameters. After generating an arbitrary spectral parameter sequence from HMMs,it is emphasized by a conversion to bring the histogram of the generated spectral parameters closer to that of the spectral parameters included in the training data. The subjective experimental result demonstrates that our proposed method can improve speech quality.

Index Terms: speech synthesis, hidden Markov model, parameter emphasis, mel-LSP, histogram equalization

Full Paper

Bibliographic reference.  Ohtani, Yamato / Tamura, Masatsune / Morita, Masahiro / Kagoshima, Takehiko / Akamine, Masami (2012): "Histogram-based spectral equalization for HMM-based speech synthesis using mel-LSP", In INTERSPEECH-2012, 1155-1158.