13th Annual Conference of the International Speech Communication Association

Portland, OR, USA
September 9-13, 2012

Phone Recognition in Critical Bands Using Sub-band Temporal Modulations

Feipeng Li, Sri Harish Mallidi, Hynek Hermansky

Center for Language and Speech Processing, Human Language Technology Center of Excellence, Department of Electrical and Computer Engineering, Johns Hopkins University, Baltimore, MD, USA

Researches on human speech perception indicate that temporal envelopes of speech signal are the main carrier of linguistic information. In automatic speech recognition (ASR), the long-term temporal envelopes of subband signals are replaced with short-time spectral envelopes to characterize the linguistic information in speech signal. Past studies have repeatedly shown that temporal fluctuation of spectral trajectory beyond the range of [1, 12]Hz can be harmful to speech recognition. This study investigates the significance of temporal modulation for phoneme identification in machine system. Both long-term temporal envelopes and short-term spectral envelopes are used as the front-end features. Results indicate that temporal modulations above 16 Hz have significant contribution to phoneme identification in clean and noisy conditions, in long-term analysis case. Whereas in short-term analysis case, modulations above 16 Hz are not robust.

Index Terms: multistream, temporal modulations, phone recognition

Full Paper

Bibliographic reference.  Li, Feipeng / Mallidi, Sri Harish / Hermansky, Hynek (2012): "Phone recognition in critical bands using sub-band temporal modulations", In INTERSPEECH-2012, 1816-1819.