This paper proposes using a hidden Markov model (HMM) to model a speech signal in terms of its speech class (voiced, unvoiced and nonspeech) and for voiced speech its fundamental frequency. States of the HMM represent unvoiced speech and nonspeech with multiple voiced states that model different fundamental frequencies. The transition matrix of the HMM models temporal changes in speech class and the time-varying fundamental frequency contour. The model is then applied to voicing and fundamental frequency estimation by extracting acoustic features from a speech signal and then applying Viterbi decoding. Experimental results are presented that investigate the estimation accuracy of the proposed system and a comparison is made against conventional methods.
Bibliographic reference. Taylor, John H. / Milner, Ben (2013): "Modelling and estimation of the fundamental frequency of speech using a hidden Markov model", In INTERSPEECH-2013, 1926-1930.