DiSS-LPSS Joint Workshop 2010

The 5th Workshop on Disfluency in Spontaneous Speech
The 2nd International Symposium on Linguistic Patterns in Spontaneous Speech

Tokyo, Japan, September 25-26, 2010

Investigating the COG Ratio as Feature for Speaker Verification on High-Effort Speech

Corinna Harwardt

Fraunhofer FKIE, Command and Control Information Systems, Germany

Vocal effort mismatch in training and test data leads to immense degradations of speaker recognition systems. The changes on the acoustics of a speech signal induced by raised vocal effort are complex and despite several studies from various authors not completely known yet.
   Instead of just gaining knowledge about these differences for automatic speaker recognition it is rather an essential to discover features that remain relatively stable in changing vocal effort conditions and contain speaker specific information. In this study we investigate the center of gravity (COG) ratio for high and mid frequency bands as feature for speaker recognition. We find that vocal effort mismatch leads to an equal error rate (EER) more than six times higher for a standard MFCCbased GMM-UBM system. For the COG ratio we observe a much smaller degradation of around 25%.
   When adapting the UBM with additional high-effort speech data the EER of the COG ratio gets even better for the mismatch condition than for the matching task. Combining MFCC and the COG ratio leads to best results with an overall improvement of 16% compared to the standard MFCC-based system.

Index Terms. vocal effort, speaker recognition, center of gravity ratio

Bibliographic reference.  Harwardt, Corinna (2010): "Investigating the COG ratio as feature for speaker verification on high-effort speech", In DiSS-LPSS-2010, 35-38.