This paper discusses the requirements for an automatic diphone recording and segmentation system, and presents a PC-based system. The level and speech rate are controlled for each test word at recording time. A set of Norwegian test words is segmented by two different methods: 1) A speaker indepedant Hidden Markov Model (HMM), and 2) A Dynamic Time Warping (DTW) procedure adapted to one speaker. Norwegian diphones are then extracted. The best performance is obtained with the DTW procedure, giving a satisfactory segmentation for about 99 percent of the diphones. Keywords: - Automatic segmentation - Diphone synthesis - PSOLA synthesis - Dynamic time warping - Hidden Markov Model
Bibliographic reference. Ottesen, Georg E. (1991): "An automatic diphone segmentation system", In EUROSPEECH-1991, 713-716.