This article analyzes the phonetic decoding performance obtained with different choices of linguistic units. The context is to later use such an approach as a support for helping communication with deaf people, and to run it on an embedded decoder on a portable terminal, which introduces constrains on the model size. As a first step, this paper presents and analyses the performance of various approaches. Two baseline systems are considered, one relying on a large vocabulary speech recognizer, and another one relying on a phonetic n-gram language model. Then syllable-based lexicons and language models are investigated. Various lexicon sizes are studied by setting thresholds on their frequency of occurrences in the training data. Evaluations are conducted on the ESTER and ETAPE speech corpora. Keeping only the most frequent syllables leads to a limited-size lexicon and language model, which nevertheless provides good phonetic decoding performance. The phone error rate is only 4% worse (absolute) than the phone error rate obtained with the large vocabulary recognizer, and much better than the phone error rate obtained with the phone n-gram language model.
Bibliographic reference. Orosanu, Luiza / Jouvet, Denis (2013): "Comparison of approaches for an efficient phonetic decoding", In INTERSPEECH-2013, 1154-1158.