13th Annual Conference of the International Speech Communication Association

Portland, OR, USA
September 9-13, 2012

Improve the Implementation of Pitch Features for Mandarin Digit String Recognition Task

Pei Ding, Liqiang He

Toshiba (China) Research and Development Center, Beijing, China

Mandarin digit string recognition (MDSR) is a difficult task in the field of automatic speech recognition (ASR) and using pitch feature can significantly increase the performance. In conventional methods of pitch feature extraction, random value is commonly used as pitch output in unvoiced (UV) frames, which causes serious statistical confusion between voiced (V) and UV units and incurs abnormal likelihood in decoding. In this paper we propose to normalize the distribution of random values assigned in UV frames to avoid the above side-effects and introduce extra discrimination information in statistics. Besides, voice-level (VL), which is an intermedial parameter used in pitch estimation for V/UV decision, is adopted to expand the acoustic feature stream. VL features indicate the intensity of periodicity of speech frames and provide complementary information for ASR. In the experiments the proposed methods significantly improve the accuracy of MDSR tasks and achieve the sentence error reduction rate (ERR) of 13.3% and 15.1% versus the baseline in the evaluation on free-length and 6-digit testing set, respectively.

Index Terms: Mandarin digit string recognition, automatic speech recognition, pitch feature extraction

Full Paper

Bibliographic reference.  Ding, Pei / He, Liqiang (2012): "Improve the implementation of pitch features for Mandarin digit string recognition task", In INTERSPEECH-2012, 2618-2621.