13th Annual Conference of the International Speech Communication Association

Portland, OR, USA
September 9-13, 2012

Automatic Intelligibility Assessment of Pathologic Speech in Head and Neck Cancer Based on Auditory-inspired Spectro-temporal Modulations

Xinhui Zhou (1), Daniel Garcia-Romero (1), Nima Mesgarani (1), Maureen Stone (2), Carol Espy-Wilson (1), Shihab Shamma (1)

(1) Department of Electrical and Computer Engineering, University of Maryland, College Park, MD, USA
(2) Departments of Neural and Pain Sciences and Orthodontics, University of Maryland Dental School, Baltimore, MD, USA

Oral, head and neck cancer represents 3% of all cancers in the United States and is the 6th most common cancer worldwide. Depending on the tumor size, location and staging, patients are treated by radical surgery, radiology, chemotherapy or a combination of those treatments. As a result, their anatomical structures for speech are impaired and this leads to some negative impact on their speech intelligibility. As a part of the INTERSPEECH 2012 speaker trait Pathology sub-challenge, this study explored the use of auditory-inspired spectro-temporal modulation features for automatic speech intelligibility assessment of those pathologic speech. The averaged spectro-temporal modulations of speech considered as either intelligible or non-intelligible in the challenge database were analyzed and it was found that the non-intelligible speech tends to have its modulation amplitude peaks shift towards a smaller rate and scale. Based on SVM and GMM, variants of spectro-temporal modulation features were tested on the speaker trait challenge problem and the resulting performances on both the development and the test datasets are comparable to the baseline performance.

Index Terms: Oral, head and neck cancer, speech pathology, speech intelligibility, spectro-temporal modulation, support vector machine (SVM), Gaussian mixture model (GMM)

Full Paper

Bibliographic reference.  Zhou, Xinhui / Garcia-Romero, Daniel / Mesgarani, Nima / Stone, Maureen / Espy-Wilson, Carol / Shamma, Shihab (2012): "Automatic intelligibility assessment of pathologic speech in head and neck cancer based on auditory-inspired spectro-temporal modulations", In INTERSPEECH-2012, 542-545.