INTERSPEECH 2006 - ICSLP
This paper describes a system that compares user renditions of short sung clips with the original version of those clips. The F0 of both recordings was estimated and then Viterbi-aligned with each other. The total difference in pitch after alignment was used as a distance metric and transformed into a rating out of ten, to indicate to the user how close he or she was to the original singer. An existing corpus of sung speech was used for initial design and optimisation of the system. We then collected further development and evaluation corpora - these recordings were judged for closeness to an original recording by two human judges. The rankings assigned by those judges were used to design and optimise the system. The design was then implemented and deployed as part of a telephone-based entertainment application.
Bibliographic reference. Lal, Partha (2006): "A comparison of singing evaluation algorithms", In INTERSPEECH-2006, paper 1119-Thu1BuP.13.