Second European Conference on Speech Communication and Technology

Genova, Italy
September 24-26, 1991


An Automatic Diphone Segmentation System

Georg E. Ottesen

SINTEF DELAB, Trondheim, Norway

This paper discusses the requirements for an automatic diphone recording and segmentation system, and presents a PC-based system. The level and speech rate are controlled for each test word at recording time. A set of Norwegian test words is segmented by two different methods: 1) A speaker indepedant Hidden Markov Model (HMM), and 2) A Dynamic Time Warping (DTW) procedure adapted to one speaker. Norwegian diphones are then extracted. The best performance is obtained with the DTW procedure, giving a satisfactory segmentation for about 99 percent of the diphones. Keywords: - Automatic segmentation - Diphone synthesis - PSOLA synthesis - Dynamic time warping - Hidden Markov Model

Full Paper

Bibliographic reference.  Ottesen, Georg E. (1991): "An automatic diphone segmentation system", In EUROSPEECH-1991, 713-716.