Third ESCA/COCOSDA Workshop on Speech Synthesis
November 26-29, 1998
Diphone concatenation  has the advantages of simplicity and a relatively small database of speech when compared to other concatenative synthesis methods (e.g., ). However, diphone concatenation faces two notable problems. The first is coarticulation which extends beyond the scope of a single diphone and entails some degree of contextual mismatch for virtually any diphone in at least some concatenation contexts. The second problem, which stems from the first, is computational. It is the problem of selecting, from a specific speech corpus, an optimal instance of each diphone to achieve the least amount of temporal and spectral distortion in the broadest set of concatenation contexts (e.g., ).
We present a variant of diphone synthesis which addresses both problems by (a) allowing multiple tokens of diphones where needed to accommodate the effects of coarticulation, and (b) postponing diphone selection until synthesis when optimization can be constrained by known contextual factors. This method, termed Biphone Constrained Concatenation (BCC), has been implemented for use in the ModelTalker TtS system . Comparisons of speech synthesized using BCC versus speech synthesized using pure diphone concatenation indicate clear improvements in naturalness for the BCC method. However, our listening experiments also demonstrated some increase in consonant confusions for the BCC method due to uncontrolled durational factors.
Full Paper (with 2 sound examples linked from within the paper)
Bibliographic reference. Bunnell, H. Timothy / Hoskins, Steven R. / Yarrington, Debra M. (1998): "A Biphone Constrained Concatenation Method for Diphone Synthesis", In SSW3-1998, 171-176.