Third International Conference on Spoken Language Processing (ICSLP 94)
One of the most important issues in large-vocabulary continuous speech recognition is the modeling of subword units. To model context-dependent acoustic-phonetic variations, typically a large number of units such as triphones are used. Given a finite amount of training data, many triphones are underrepresented and remain undertrained or even untrained. This paper proposes an algorithm for mapping underrepresented triphones to adequately represented ones that are phonetically similar. First, all triphones are categorized according to their places and manner of articulation. Each triphone that needs to be mapped is compared to other triphones, and the candidates are ranked according to whether the left contexts and/or the right contexts are in the same phonetic class, as determined by acoustic-phonetic variations due to context. Second, if a good candidate has not been found, each candidate triphone is analyzed as phonological feature vectors, and the ranking of similarity is determined by the dot product of the vectors. The best candidate for mapping is chosen on the basis of phonetic similarity as well as the frequency of occurrence of the candidate triphones. In a recognition test of the Resource Management task, using this phonological mapping reduces the word error rate significantly.
Bibliographic reference. Wong, Maurice K. (1994): "Clustering triphones by phonological mapping", In ICSLP-1994, 1939-1942.