4th International Conference on Spoken Language Processing
Philadelphia, PA, USA
In this paper, we propose a new method of phoneme segmentation using MLP (multi-layer perceptron). The structure of the proposed segmenter consists of three parts: preprocessor, MLP-based phoneme segmenter, and postprocessor. The preprocessor utilizes a sequence of 44 order feature parameters for each frame of speech, based on the acoustic-phonetic knowledge. The MLP has one hidden layer and an output layer. The feature parameters for four consecutive inter-frame features (176 parameters) are served as input data. The output value decides whether the current frame is a phoneme boundary or not. In postprocessing, we decide the positions of phoneme boundaries using the output of the MLP. We obtained 84 % for 5 msec-accuracy and 87 % for 15 msec-accuracy with an insertion rate of 9 % for open test. By adjusting the threshold value of the MLP output, we achieved higher accuracy. When we decreased the threshold by 0.4, we obtained 5 msec-accuracy of 92 % with insertion rate of 3.4 % for the insertions that are more than 15 msec apart from phoneme boundaries.
Bibliographic reference. Suh, Youngjoo / Lee, Youngjik (1996): "Phoneme segmentation of continuous speech using multi-layer perceptron", In ICSLP-1996, 1297-1300.