12th Annual Conference of the International Speech Communication Association

Florence, Italy
August 27-31. 2011

Unsupervised Audio Patterns Discovery Using HMM-Based Self-Organized Units

Man-hung Siu, Herbert Gish, Steve Lowe, Arthur Chan

Raytheon BBN Technologies, USA

In our previous work [1, 2], we trained an HMM-based speech recognizer without transcription or any knowledge or resources. The trained HMM recognizer was used to transcribe audio into self-organized units (SOUs) and we evaluated its performance on the task of topic identification. In this paper, we report our work in applying SOUs to discover audio patterns in spoken documents without supervision. By recognizing audio into SOUs which are sound-like units, the discovery for common audio patterns can be carried out extremely efficiently over a large corpus, without dynamic programming comparisons as proposed by earlier work [3]. Experiments were performed on Mandarin conversational telephone speech using both the one-best SOU token sequences and SOU consensus networks. We show that using SOU as keys to audio patterns, we can discover frequently spoken words with good purity.


  1. H. Gish, M. Siu, A. Chan andW. Belfield, “Unsupervised training of an HMM-based speech recognition system for topic classification,” Interspeech 2009.
  2. M. Siu, H Gish, A Chan and W. Belfield , “Improved Topic Classification and Keyword Discovery Using an HMM-Based Speech Recognizer Trained Without Supervision”, Interspeech 2010.
  3. Y. Zhang and J. Glass, “Towards multi-speaker unsupervised speech pattern discovery,” in ICASSP, 2010

Full Paper

Bibliographic reference.  Siu, Man-hung / Gish, Herbert / Lowe, Steve / Chan, Arthur (2011): "Unsupervised audio patterns discovery using HMM-based self-organized units", In INTERSPEECH-2011, 2333-2336.