EUROSPEECH 2001 Scandinavia
This paper proposes a new approach to extraction of a corpus-based database of residual signal segments that are used as excitations of a production model to replay MFCC encoded speech signal with natural sound. Neither extra information besides the MFCCs (like F0, voiced/unvoiced flag etc.) nor modification and/or extension of a MFCC computation algorithm is needed. The MFCC algorithm is considered to be in a commonly accepted form that was implemented for example in the HTK software. Because of mentioned restrictions we don't aim to achieve exact reconstruction of original signal but we seek to replay the speech signal in an intelligible and as natural as possible way. Moreover, the 'low-demanding' solution based on pulse/noise excitation is offered that employs a new method for making voiced/unvoiced decision using the MFCC vector only.
Bibliographic reference. Tychtl, Zbyn.ek / Psutka, Josef (2001): "Corpus-based database of residual excitations used for speech reconstruction from MFCCs", In EUROSPEECH-2001, 2259-2262.