INTERSPEECH 2006 - ICSLP
We propose a new approach to automatic speech recognition based on word detection and knowledge-based verification. Given an utterance, we first design a collection of word detectors, one for each lexical item in the vocabulary. Some pruning strategies are used to eliminate unlikely word candidates. Then these detected words are combined into word strings. The proposed approach is different from the conventional maximum a posteriori decoding method, and it is a critical component in building a bottom-up, detection-based speech recognition system in which knowledge in acoustics, speech and language can easily be incorporated into pruning unlikely word hypotheses and rescoring. The proposed approach was evaluated on a connected digit task using phone models trained from the TIMIT corpus. When compared with state-of-the-art connected digit recognition algorithms, we found the proposed detection based framework works well even no digit samples were used for training the detectors and recognizers. ?Other knowledge based constraints, such as manner and place of articulation detectors, can be incorporated into this detection-based approach to improve the robustness and performance of the overall system.
Bibliographic reference. Ma, Chengyuan / Tsao, Yu / Lee, Chin-Hui (2006): "A study on detection based automatic speech recognition", In INTERSPEECH-2006, paper 2053-Thu1CaP.13.