Retrieval of Textual Song Lyrics from Sung Inputs

Anna M. Kruspe

Retrieving the lyrics of a sung recording from a database of text documents is a research topic that has not received attention so far. Such a retrieval system has many practical applications, e.g. for karaoke applications or for indexing large song databases by their lyric content.

In this paper, we present such a lyrics retrieval system. In a first step, phoneme posteriorgrams are extracted from sung recordings using various acoustic models trained on TIMIT and a variation thereof, and on subsets of a large database of recordings of unaccompanied singing ( DAMP). On the other side, we generate binary templates from the available textual lyrics. Since these lyrics do not have any temporal information, we then employ an approach based on Dynamic Time Warping to retrieve the most likely lyrics document for each recording.

The approach is tested on a different subset of the unaccompanied singing database which includes 601 recordings of 301 different songs (12000 lines of lyrics). The approach is evaluated both on a song-wise and on a line-wise scale.

The results are highly encouraging and could be used further to perform automatic lyrics alignment and keyword spotting for large databases of songs.

DOI: 10.21437/Interspeech.2016-1272

Cite as

Kruspe, A.M. (2016) Retrieval of Textual Song Lyrics from Sung Inputs. Proc. Interspeech 2016, 2140-2144.

author={Anna M. Kruspe},
title={Retrieval of Textual Song Lyrics from Sung Inputs},
booktitle={Interspeech 2016},