DiSS-LPSS Joint Workshop 2010
The 5th Workshop on Disfluency in Spontaneous Speech
Vocal effort mismatch in training and test data leads to immense
degradations of speaker recognition systems. The changes on
the acoustics of a speech signal induced by raised vocal effort
are complex and despite several studies from various authors
not completely known yet.
Instead of just gaining knowledge about these differences for automatic speaker recognition it is rather an essential to discover features that remain relatively stable in changing vocal effort conditions and contain speaker specific information. In this study we investigate the center of gravity (COG) ratio for high and mid frequency bands as feature for speaker recognition. We find that vocal effort mismatch leads to an equal error rate (EER) more than six times higher for a standard MFCCbased GMM-UBM system. For the COG ratio we observe a much smaller degradation of around 25%.
When adapting the UBM with additional high-effort speech data the EER of the COG ratio gets even better for the mismatch condition than for the matching task. Combining MFCC and the COG ratio leads to best results with an overall improvement of 16% compared to the standard MFCC-based system.
Index Terms. vocal effort, speaker recognition, center of gravity ratio
Bibliographic reference. Harwardt, Corinna (2010): "Investigating the COG ratio as feature for speaker verification on high-effort speech", In DiSS-LPSS-2010, 35-38.