INTERSPEECH 2006 - ICSLP
We propose a new algorithm for voiced/unvoiced classification of speech on a phoneme or sample level. The algorithm is inspired by auditory based approaches and combines two cues. One cue is based on the energy distribution of the signal and the other on the harmonicity. In order to extract the harmonicity of the signal we calculate a histogram of the zero crossings of the filter channels after applying a Gammatone filterbank to the signal. A measure similar to the variance of the zero crossings yields the harmonicity cue. The performance of the algorithm was measured on several minutes of read and spontaneous speech with various speakers. An algorithm proposed by Mustafa et al.  served as benchmark. The results show that our algorithm performs significantly better as well on read as on spontaneous speech and seems in particular be better able to cope with different speaking styles.
Bibliographic reference. Heckmann, Martin / Moebus, Marco / Joublin, Frank / Goerick, Christian (2006): "Speaker independent voiced-unvoiced detection evaluated in different speaking styles", In INTERSPEECH-2006, paper 1249-Wed1FoP.5.