4th Workshop on Spoken Language Technologies for Under-Resourced Languages

St. Petersburg, Russia
14-16 May 2014

Contributed Papers

Query-by-example spoken term detection evaluation on low-resource languages
Xavier Anguera, Luis J. Rodriguez-Fuentes, Igor Szöke, Andi Buzo, Florian Metze, Mikel Penagarikano

Features for factored language models for code-Switching speech
Heike Adel, Katrin Kirchhoff, Dominic Telaar, Ngoc Thang Vu, Tim Schlippe, Tanja Schultz

Adapting multilingual neural network hierarchy to a new language
Frantisek Grézl, Martin Karafiát

Recent progress in developing grapheme-based speech recognition for Indonesian ethnic languages: Javanese, Sundanese, Balinese and Bataks
Sakriani Sakti, Satoshi Nakamura

Speech alignment and recognition experiments for Luxembourgish
Martine Adda-Decker, Lori Lamel, Gilles Adda

On using intrinsic spectral analysis for low-resource languages
Reza Sahraeian, Dirk Van Compernolle, Febe de Wet

Semi-supervised G2p bootstrapping and its application to ASR for a very under-resourced language: Iban
Sarah Samson Juan, Laurent Besacier, Solange Rossato

Towards automatic speech recognition without pronunciation dictionary, transcribed speech and text resources in the target language using cross-lingual word-to-phoneme alignment
Felix Stahlberg, Tim Schlippe, Stephan Vogel, Tanja Schultz

Rescoring n-best lists for Russian speech recognition using factored language models
Irina Kipyatkova, Vasilisa Verkhodanova, Alexey Karpov

On Mirandese language resources for text-to-speech
José Pedro Ferreira, Cristiano Chesi, Hyongsil Cho, Daan Baldewijns, Daniela Braga, Miguel Dias

HMM-based speech synthesiser for the Urdu language
Zeeshan Ahmed, Joao P. Cabral

Intonation issues in HMM-based speech synthesis for Vietnamese
Thi Thu Trang Nguyen, Do Dat Tran, Albert Rilliard, Christophe d’Alessandro, Thi Ngoc Yen Pham

High quality speech synthesis using a small speech dataset
Pavel Chistikov, Andrey Talanov

Cross-word sub-word units for low-resource keyword spotting
William Hartmann, Lori Lamel, Jean-Luc Gauvain

Recent improvements in Estonian LVCSR
Tanel Alumäe

Unsupervised acoustic model training using multiple seed ASR systems
Horia Cucu, Andi Buzo, Corneliu Burileanu

A bilingual study on the prediction of morph-based improvement
Balázs Tarján, Tibor Fegyó, Péter Mihajlik

Combining grapheme-to-phoneme converter outputs for enhanced pronunciation generation in low-resource scenarios
Tim Schlippe, Wolf Quaschningk, Tanja Schultz

Development of a Korean speech recognition system with little annotated data
Antoine Laurent, Lori Lamel

Towards the automatic processing of Yongning Na (sino-tibetan): developing a ‘light’ acoustic model of the target language and testing ‘heavyweight’ models from five national languages
Thi-Ngoc-Diep Do, Alexis Michaud, Eric Castelli

Exploring pronunciation variants for Romanian speech-to-text transcription
Ioana Vasilescu, Bianca Vieru, Lori Lamel

Cross-language mapping for small-vocabulary ASR in under-resourced languages: investigating the impact of source language choice
Anjana Vakil, Alexis Palmer

Speech-to-text development for Slovak, a low-resourced language
Cong-Thanh Do, Lori Lamel, Jean-Luc Gauvain

Sequence memoizer based language model for Russian speech recognition
Daria Vazhenina, Konstantin Markov

Code-Switching speech recognition for closely related languages
Tetyana Lyudovyk, Valeriy Pylypenko

The NCHLT speech corpus of the South African languages
Etienne Barnard, Marelie H. Davel, Charl van Heerden, Febe de Wet, Jaco Badenhorst

Community-based resource building and data collection
Kristiina Jokinen, Graham Wilcock

Automatic detection of anglicisms for the pronunciation dictionary generation: a case study on our German IT corpus
Sebastian Leidig, Tim Schlippe, Tanja Schultz

A robust diacritics restoration system using unreliable raw text data
Lucian Petrică, Horia Cucu, Andi Buzo, Corneliu Burileanu

“STC spoofing” database for text-dependent speaker recognition evaluation
Konstantin Simonchik, Vadim Shchemelinin

Modeling code-Switching speech on under-resourced languages for language identification
Koena Ronny Mabokela, Madimetja Jonas Manamela, Mabu Manaileng

Towards low-resource prosodic boundary detection
Bogdan Ludusan, Emmanuel Dupoux

Speech data collection in an under-resourced language within a multilingual context
Raymond Molapo, Etienne Barnard, Febe de Wet

The development of new corpora for under-resourced languages using data available for well-resourced ones
Pavel Skrelin, Nina Volskaya, Karina Evgrafova, Riikka Ullakonoja

Web lexicography for and by non-tech people
Dmitri Dmitriev

Phonetic tool for the Tunisian Arabic
Abir Masmoudi, Yannick Estève, Mariem Ellouze Khmekhem, Fethi Bougares, Lamia Hadrich Belguith

Grapheme to phoneme conversion: an Arabic dialect case
S. Harrat, Karima Meftouh, M. Abbas, K. Smaili

Sounds and symbols: an overview of different types of methods dealing with letters-to-sounds relationships in a wide range of languages in automatic speech recognition
Maria Goudi, Pascal Nocera



