We propose an improved Tandem system for tonal language speech recognition. Three different types of features, cepstral, spectrotemporal and pitch features, are integrated for modeling tone and phoneme variation simultaneously. Tonal phonemes (or tonemes) are used for MLP posterior estimation, and tonal acoustic units for HMM recognition. In our experiments conducted on Mandarin broadcast news, a 19.3% relative CER reduction was achieved over the conventional MFCC Tandem baseline. With different training acoustic units, we analyze the complementarity among the three types of features in tone, phoneme, and toneme classification.
Bibliographic reference. Li, Shang-wen / Wang, Yow-bang / Sun, Liang-che / Lee, Lin-shan (2011): "Improved tonal language speech recognition by integrating spectro-temporal evidence and pitch information with properly chosen tonal acoustic units", In INTERSPEECH-2011, 2293-2296.