DNN-based Speech Synthesis considering Dialogue-Act Information and its Evaluation with Respect to Illocutionary Act Naturalness

Nobukatsu Hojo, Yusuke Ijima, Hiroaki Sugiyama, Noboru Miyazaki, Takahito Kawanishi, Kunio Kashino


This study aimed at improving synthesized speech generated by a text-to-speech (TTS) system used for a spoken dialogue system in regard to how naturally the synthesized speech conveys the system's intention to the hearer. We call the measure of naturalness in this case ``illocutionary act naturalness''. To achieve our aim, we utilized dialogue-act (DA) information as an auxiliary feature for a deep neural network (DNN)-based speech synthesis system. First, we constructed a speech database with DA tags. Second, we used the database to build the speech synthesis system. Third, we evaluated the method by comparing its performance with a DNN making use of conventional linguistic features and hidden Markov models (HMMs) supplemented with DAs. We conducted a listening test designed to evaluate illocutionary act naturalness. The results show that the proposed method improves the illocutionary act naturalness compared with the conventional method. We also found that the illocutionary act naturalness score depended on certain features of the test sentence as well as the DA and speech synthesis method. The results shows that a test set designed by considering these features will improve the reproducibility of the illocutionary act naturalness evaluation.


 DOI: 10.21437/SpeechProsody.2020-199

Cite as: Hojo, N., Ijima, Y., Sugiyama, H., Miyazaki, N., Kawanishi, T., Kashino, K. (2020) DNN-based Speech Synthesis considering Dialogue-Act Information and its Evaluation with Respect to Illocutionary Act Naturalness. Proc. 10th International Conference on Speech Prosody 2020, 975-979, DOI: 10.21437/SpeechProsody.2020-199.


@inproceedings{Hojo2020,
  author={Nobukatsu Hojo and Yusuke Ijima and Hiroaki Sugiyama and Noboru Miyazaki and Takahito Kawanishi and Kunio Kashino},
  title={{DNN-based Speech Synthesis considering Dialogue-Act Information and its Evaluation with Respect to Illocutionary Act Naturalness}},
  year=2020,
  booktitle={Proc. 10th International Conference on Speech Prosody 2020},
  pages={975--979},
  doi={10.21437/SpeechProsody.2020-199},
  url={http://dx.doi.org/10.21437/SpeechProsody.2020-199}
}