SLaTE 2015 - Workshop on Speech and Language Technology in Education

Leipzig, Germany
September 4-5, 2015

ASR Technology to Empower Partial and Synchronized Caption for L2 Listening Development

Maryam Sadat Mirzaei, Tatsuya Kawahara

Graduate School of Informatics, Kyoto University, Sakyo, Kyoto, Japan

This study introduces a tool, partial and synchronized caption (PSC), for training second language (L2) listening skill. PSC uses an automatic speech recognition (ASR) system to realize word-level alignment between text and speech while it refers to the corpora to effectively select a subset of words for inclusion in the caption. The selection criteria are based on three features contributing to L2 listening difficulties: speech rate, word frequency and specificity. Our findings reveal that PSC in its current state leads to the same level of comprehension as the full caption condition. PSC, however, outperforms the full caption when it comes to preparing learners for listening without using any textual clues as in real-life situations. To enhance this system the incorporation of other features is a necessity. However, the relationship between these factors and their contribution to listening difficulty is complex. This study conducts a root cause analysis on the ASR errors to better understand the underlying features that make recognition difficult for such systems and compares these features with L2 listening influential factors. Our preliminary analysis revealed an interesting similarity between features leading to L2 difficulty and those resulting in ASR errors. Such insightful findings shed light on the future improvements for PSC.

Full Paper

Bibliographic reference.  Mirzaei, Maryam Sadat / Kawahara, Tatsuya (2015): "ASR technology to empower partial and synchronized caption for L2 listening development", In SLaTE-2015, 65-70.