4th International Conference on Spoken Language Processing
Philadelphia, PA, USA
In this paper we present a hybrid statistical and rule-based segmentation system which takes into account phonetic variation of German. Input to the system is the orthographic representation and the speech signal of an utterance to be segmented. The output is the transcription (SAM-PA) with the highest overall likelihood and the corresponding segmentation of the speech signal. The system consists of three main parts: In a first stage the orthographic representation is converted into a linear string of phonetic units by lexicon lookup. Phonetic rules are applied yielding a graph that contains the canonic form and presumed variations. In a second HMM-based stage the speech signal of the concerning utterance is time-aligned by a Viterbi search which is constrained by the graph of the first stage. The outcome of this stage is a string of phonetic labels and the corresponding segment boundaries. A rule-based refinement of the segment boundaries using phonetic knowledge takes place in a third stage.
Bibliographic reference. Kipp, Andreas / Wesenick, Maria-Barbara / Schiel, Florian (1996): "Automatic detection and segmentation of pronunciation variants in German speech corpora", In ICSLP-1996, 106-109.