ISCA Archive Interspeech 2006 Sessions Booklet
  ISCA Archive Sessions Booklet
top

Interspeech 2006

Pittsburgh, PA, USA
17-21 September 2006

General Chair: Richard Stern
doi: 10.21437/Interspeech.2006




Multilingual and Multi-Accent Processing


The 2006 RWTH parliamentary speeches transcription system
J. Lööf, M. Bisani, Ch. Gollan, G. Heigold, Björn Hoffmeister, Ch. Plahl, Ralf Schlüter, Hermann Ney

Multilingual non-native speech recognition using phonetic confusion-based acoustic model modification and graphemic constraints
G. Bouselmi, D. Fohr, I. Illina, Jean-Paul Haton

Automatic speech recognition of Cantonese-English code-mixing utterances
Joyce Y. C. Chan, P. C. Ching, Tan Lee, Houwei Cao

The ICSI+ multilingual sentence segmentation system
M. Zimmerman, Dilek Hakkani-Tür, J. Fung, N. Mirghafori, L. Gottlieb, Elizabeth Shriberg, Yang Liu

Cross-language evaluation of voice-to-phoneme conversions for voice-tag application in embedded platforms
Yan Ming Cheng, Changxue Ma, Lynette Melnar

A multi-space distribution (MSD) approach to speech recognition of tonal languages
Huanliang Wang, Yao Qian, Frank K. Soong, Jian-Lai Zhou, Jiqing Han

Comparison of acoustic modeling techniques for Vietnamese and Khmer ASR
Viet Bac Le, Laurent Besacier

Multi-accent Chinese speech recognition
Yi Liu, Pascale Fung

Comparative analysis of formants of British, american and australian accents
Seyed Ghorshi, Saeed Vaseghi, Qin Yan

Automatic initial/final generation for dialectal Chinese speech recognition
Linquan Liu, Thomas Fang Zheng, Wenhu Wu

Maximum entropy modeling for diacritization of Arabic text
Ruhi Sarikaya, Ossama Emam, Imed Zitouni, Yuqing Gao

Comparison of Slovak and Czech speech recognition based on grapheme and phoneme acoustic models
Slavomír Lihan, Jozef Juhár, Anton Cizmár


Corpora, Annotation, and Assessment Metrics I, II


Integrating Festival and Windows
Rhys James Jones, Ambrose Choy, Briony Williams

Measuring the acceptable word error rate of machine-generated webcast transcripts
Cosmin Munteanu, Gerald Penn, Ron Baecker, Elaine Toms, David James

Analyzing reusability of speech corpus based on statistical multidimensional scaling method
Goshu Nagino, Makoto Shozakai

Redundancy and productivity in the speech technology lexicon - can we do better?
Susan Fitt, Korin Richmond

Word intelligibility estimation of noise-reduced speech
Takeshi Yamada, Masakazu Kumakura, Nobuhiko Kitawaki

Exploring the unknown - collecting 1000 speakers over the internet for the ph@ttsessionz database of adolescent speakers
Christoph Draxler

A new single-ended measure for assessment of speech quality
Timothy Murphy, Dorel Picovici, Abdulhussain E. Mahdi

Speech technology for minority languages: the case of Irish (gaelic)
Ailbhe Ní Chasaide, John Wogan, Brian Ó Raghallaigh, Áine Ní Bhriain, Eric Zoerner, Harald Berthelsen, Christer Gobl

Further investigations on the relationship between objective measures of speech quality and speech recognition rates in noisy environments
Francisco José Fraga, Carlos Alberto Ynoguti, André Godoi Chiovato

Non-intrusive speech quality assessment with low computational complexity
Volodya Grancharov, David Y. Zhao, Jonas Lindblom, W. Bastiaan Kleijn

Using speech recognition technique for constructing a phonetically transcribed taiwanese (min-nan) text corpus
Min-Siong Liang, Ren-Yuan Lyu, Yuang-Chin Chiang

Sloparl - slovenian parliamentary speech and text corpus for large vocabulary continuous speech recognition
Andrej Zgank, Tomas Rotovnik, Matej Grasic, Marko Kos, Damjan Vlaj, Zdravko Kacic

An annotation scheme for agreement analysis
Siew Leng Toh, Fan Yang, Peter A. Heeman

Conversational quality estimation model for wideband IP-telephony services
Hitoshi Aoki, Atsuko Kurashima, Akira Takahashi

The vocal joystick data collection effort and vowel corpus
Kelley Kilanski, Jonathan Malkin, Xiao Li, Richard Wright, Jeff A. Bilmes

Comparison of the ITU-t p.85 standard to other methods for the evaluation of text-to-speech systems
Dmitry Sityaev, Katherine Knill, Tina Burrows

An annotation scheme for complex disfluencies
Peter A. Heeman, Andy McMillin, J. Scott Yaruss

Automatic phonetic transcription of large speech corpora: a comparative study
Christophe Van Bael, Lou Boves, Henk van den Heuvel, Helmer Strik

Examining knowledge sources for human error correction
Yongmei Shi, Lina Zhou



Speech Enhancement I, II


A novel environment-dependent speech enhancement method with optimized memory footprint
Suhadi Suhadi, Sorel Stan, Tim Fingscheidt

Weighted codebook mapping for noisy speech enhancement using harmonic-noise model
Esfandiar Zavarehei, Saeed Vaseghi, Qin Yan

MMSE estimation of complex-valued discrete Fourier coefficients with generalized gamma priors
J. Jensen, R. C. Hendriks, J. S. Erkelens, R. Heusdens

Automatic removal of typed keystrokes from speech signals
Amarnag Subramanya, Michael L. Seltzer, Alex Acero

Lattice LP filtering for noise reduction in speech signals
Erhard Rank, Gernot Kubin

Speech enhancement using modified phase opponency model
Om D. Deshmukh, Carol Y. Espy-Wilson

Single channel speech enhancement by frequency domain constrained optimization and temporal masking
Wen Jin, Michael Scordilis

Speech enhancement based on residual noise shaping
Jong Won Shin, Seung Yeol Lee, Hwan Sik Yun, Nam Soo Kim

Quality improvement of telephone speech by artificial bandwidth expansion - listening tests in three languages
Hannu Pulakka, Laura Laaksonen, Paavo Alku

Role of phase estimation in speech enhancement
Benjamin J. Shannon, Kuldip K. Paliwal

Speech enhancement based on spectral estimation from higher-lag autocorrelation
Benjamin J. Shannon, Kuldip K. Paliwal, Climent Nadeu

Noise update modeling for speech enhancement: when do we do enough?
Nitish Krishnamurthy, John H. L. Hansen

Mapping neural networks for bandwidth extension of narrowband speech
A. Shahina, B. Yegnanarayana

Decision directed constrained iterative speech enhancement
Amit Das, John H. L. Hansen

Adaptive filtering for attenuating musical noise caused by spectral subtraction
Takahiro Murakami, Yoshihisa Ishida

Evaluation of objective measures for speech enhancement
Yi Hu, Philipos C. Loizou

Performance analysis of various single channel speech enhancement algorithms for automatic speech recognition
Myung-Suk Song, Chang-Heon Lee, Hong-Goo Kang


ASR Other I, II


Computer-assisted closed-captioning of live TV broadcasts in French
G. Boulianne, J.-F. Beaumont, M. Boisvert, J. Brousseau, P. Cardinal, C. Chapdelaine, M. Comeau, Pierre Ouellet, F. Osterrath

On the use of morphological analysis for dialectal Arabic speech recognition
Mohamed Afify, Ruhi Sarikaya, Hong-Kwang Jeff Kuo, Laurent Besacier, Yuqing Gao

Recognition of classroom lectures in european portuguese
Isabel Trancoso, Ricardo Nunes, Luís Neves, Céu Viana, Helena Moniz, Diamantino Caseiro, Ana Isabel Mata

Investigating automatic decomposition for ASR in less represented languages
Thomas Pellegrini, Lori Lamel

Automatic transcription of Somali language
Abdillahi Nimaan, Pascal Nocéra, Jean-François Bonastre

Analysis of overlaps in meetings by dialog factors, hot spots, speakers, and collection site: insights for automatic speech recognition
Özgür Çetin, Elizabeth Shriberg

Improving speech recognition of two simultaneous speech signals by integrating ICA BSS and automatic missing feature mask generation
Ryu Takeda, Shun'ichi Yamamoto, Kazunori Komatani, Tetsuya Ogata, Hiroshi G. Okuno

Missing-feature reconstruction for band-limited speech recognition in spoken document retrieval
Wooil Kim, John H. L. Hansen

Incremental learning of MAP context-dependent edit operations for spoken phone number recognition in an embedded platform
Hahn Koo, Yan Ming Cheng

Development and evaluation of speech database in automotive environments for practical speech recognition systems
Yasunari Obuchi, Nobuo Hataoka

An effective and efficient utterance verification technology using word n-gram filler models
Dong Yu, Yun-Cheng Ju, Alex Acero

An efficient bispectrum phase entropy-based algorithm for VAD
J. M. Górriz, Javier Ramírez, C. G. Puntonet, José C. Segura

Two-step unsupervised speaker adaptation based on speaker and gender recognition and HMM combination
Petr Cerva, Jan Nouza, Jan Silovsky

CENSREC2: corpus and evaluation environments for in car continuous digit speech recognition
Satoshi Nakamura, Masakiyo Fujimoto, Kazuya Takeda

Detection of word fragments in Mandarin telephone conversation
Cheng-Tao Chu, Yun-Hsuan Sung, Yuan Zhao, Daniel Jurafsky

A DTW-based dissimilarity measure for left-to-right hidden Markov models and its application to word confusability analysis
Qiang Huo, Wei Li

Multi-flow block interleaving applied to distributed speech recognition over IP networks
Angel M. Gómez, Juan J. Ramos-Muñoz, Antonio M. Peinado, Victoria Sánchez

Moving speech recognition from software to silicon: the in silico vox project
Edward C. Lin, Kai Yu, Rob A. Rutenbar, Tsuhan Chen

A study on detection based automatic speech recognition
Chengyuan Ma, Yu Tsao, Chin-Hui Lee

Novel time domain multi-class SVMs for landmark detection
Rahul Chitturi, Mark Hasegawa-Johnson




Front-End Methods for ASR


Feature combination using linear discriminant analysis and its pitfalls
Ralf Schlüter, András Zolnay, Hermann Ney

Discriminant linear processing of time-frequency plane
Fabio Valente, Hynek Hermansky

Automatic speech recognition experiments with articulatory data
Esmeralda Uraga, Thomas Hain

Speech recognition with phonological features: some issues to attend
Frederik Stouten, Jean-Pierre Martens

Multi-source far-distance microphone selection and combination for automatic transcription of lectures
Matthias Wölfel, Christian Fügen, Shajith Ikbal, John W. McDonough

Statistical analysis and performance of DFT domain noise reduction filters for robust speech recognition
Colin Breithaupt, Rainer Martin

Normalization of the inter-frame information using smoothing filtering
L. García, José C. Segura, Carmen Benítez, Javier Ramírez, Ángel de la Torre

Comparative study on contributions of pitch-synchronization and peak-amplitude towards robustness issue of ASR
Muhammad Ghulam, Junsei Horikawa, Tsuneo Nitta

Phoneme recognition based on fisher weight map to higher-order local auto-correlation
Yasuo Ariki, Shunsuke Kato, Tetsuya Takiguchi

Data-driven design of front-end filter bank for Lombard speech recognition
Hynek Boril, Petr Fousek, Petr Pollák

Optimization of class weights for LDA feature transformations
Andrej Ljolje

LDA based feature estimation methods for LVCSR
Janne Pylkkönen

Robust feature extraction based on spectral peaks of group delay and autocorrelation function and phase domain analysis
G. Farahani, S.M. Ahadi, M. Mehdi Homayounpour

Frequency warping by linear transformation of standard MFCC
Sankaran Panchapagesan



Spoken Dialog Systems I, II


Dynamic extension of a grammar-based dialogue system: constructing an all-recipes knowing robot
Petra Gieselmann, Alex Waibel

Scalable and portable web-based multimodal dialogue interaction with geographical databases
Alexander Gruenstein, Stephanie Seneff, Chao Wang

System- versus user-initiative dialog strategy for driver information systems
Chantal Ackermann, Marion Libossek

Have we met? MDP based speaker ID for robot dialogue
Filip Krsmanovic, Curtis Spencer, Daniel Jurafsky, Andrew Y. Ng

Prominent words as anchors for TRP projection
Rob J. J. H. van Son, Wieneke Wesseling, Louis C. W. Pols

Learning multi-goal dialogue strategies using reinforcement learning with reduced state-action spaces
Heriberto Cuayáhuitl, Steve Renals, Oliver Lemon, Hiroshi Shimodaira

Pitch range and pause duration as markers of discourse hierarchy: perception experiments
Jörg Mayer, Ekaterina Jasinskaja, Ulrike Kölsch

Radiobot-CFF: a spoken dialogue system for military training
Antonio Roque, Anton Leuski, Vivek Rangarajan, Susan Robinson, Ashish Vaswani, Shrikanth Narayanan, David Traum

Is voice quality enough? - study on how the situation and user²s awareness influence the utterance features
Shinya Yamada, Toshihiko Itoh, Kenji Araki

Development of slovak GALAXY/voiceXML based spoken language dialogue system to retrieve information from the internet
Jozef Juhár, Stanislav Ondas, Anton Cizmár, Milan Rusko, Gregor Rozinaj, Roman Jarina

LINTest: a development tool for testing dialogue systems
Lars Degerstedt, Arne Jönsson

A user simulator based on voiceXML for evaluation of spoken dialog systems
Akinori Ito, Keisuke Shimada, Motoyuki Suzuki, Shozo Makino

User expectations and real experience on a multimodal interactive system
Kristiina Jokinen, Topi Hurtig

Detecting anger in automated voice portal dialogs
F. Burkhardt, J. Ajmera, Roman Englert, J. Stegmann, W. Burleson

Evaluation of a spoken dialogue system with usability tests and long-term pilot studies: similarities and differences
Markku Turunen, Jaakko Hakulinen, Anssi Kainulainen

CHAT: a conversational helper for automotive tasks
Fuliang Weng, Sebastian Varges, Badri Raghunathan, Florin Ratiu, Heather Pon-Barry, Brian Lathrop, Qi Zhang, Harry Bratt, Tobias Scheideck, Kui Xu, Matthew Purver, Rohit Mishra, Annie Lien, M. Raya, S. Peters, Y. Meng, J. Russell, Lawrence Cavedon, Elizabeth Shriberg, H. Schmidt, R. Prieto

User simulation for spoken dialogue systems: learning and evaluation
Kallirroi Georgila, James Henderson, Oliver Lemon


Speaker Characterization and Recognition I-IV


Improving the characterization of the alternative hypothesis via kernel discriminant analysis for likelihood ratio-based speaker verification
Yi-Hsiang Chao, Wei-Ho Tsai, Hsin-Min Wang, Ruei-Chuan Chang

A discriminative method for speaker verification using the difference information
Zhenchun Lei, Yingchun Yang, Zhaohui Wu

A multiclass framework for speaker verification within an acoustic event sequence system
Nicolas Scheffer, Jean-François Bonastre

Speaker cluster based GMM tokenization for speaker recognition
Bin Ma, Donglai Zhu, Rong Tong, Haizhou Li

Intra-speaker variability compensation in speaker verification with limited enrolling data
Claudio Garreton, Nestor Becerra Yoma, Carlos Molina, Fernando Huenupan

Speaking faces for face-voice speaker identity verification
Girija Chetty, Michael Wagner

Significance of formants from difference spectrum for speaker identification
Kishore Prahallad, Varanasi Sudhakar, Veluru Ranganatham, Krishna M. Bharat, S. Roy Debashish

Using genetic algorithms to weight acoustic features for speaker recognition
Maider Zamalloa, Germán Bordel, Luis Javier Rodríguez, Mikel Penagarikano, Juan Pedro Uribe

Missing feature theory with soft spectral subtraction for speaker verification
Michael T. Padilla, Thomas F. Quatieri, Douglas A. Reynolds

Prosodic features for speaker verification
Leena Mary, B. Yegnanarayana

Unsupervised learning of HMM topology for text-dependent speaker verification
Ming Liu, Thomas S. Huang

On the use of Jacobian adaptation in real speaker verification applications
Jan Anguita, Javier Hernando

A novel framework of text-independent speaker verification based on utterance transform and iterative cohort modeling
Ming Liu, Huazhong Ning, Thomas S. Huang, Zhengyou Zhang

A cohort - UBM approach to mitigate data sparseness for in-set/out-of-set speaker recognition
Vinod Prakash, John H. L. Hansen

Analysis of lombard effect under different types and levels of noise with application to in-set speaker ID systems
Vaishnevi S. Varadarajan, John H. L. Hansen

Reducing speech coding distortion for speaker identification
Alan McCree

A text-prompted distributed speaker verification system implemented on a cellular phone and a mobile terminal
Tsuneo Kato, Hisashi Kawai

Automatic detection of irregular phonation in continuous speech
Srikanth Vishnubhotla, Carol Y. Espy-Wilson

Highly noise robust text-dependent speaker recognition based on hypothesized wiener filtering
V. Ramasubramanian, Deepak Vijaywargiay, Kumar V. Praveen

Speaker identification under noisy environments by using harmonic structure extraction and reliable frame weighting
Hiromasa Fujihara, Tetsuro Kitahara, Masataka Goto, Kazunori Komatani, Tetsuya Ogata, Hiroshi G. Okuno

Enhancing the performance of a GMM-based speaker identification system in a multi-microphone setup
Andreas Stergiou, Aristodemos Pnevmatikakis, Lazaros C. Polymenakos

Discriminative adaptation for speaker verification
C. Longworth, M. J. F. Gales

Within-class covariance normalization for SVM-based speaker recognition
Andrew O. Hatch, Sachin Kajarekar, Andreas Stolcke

A new set of features for text-independent speaker identification
Carol Y. Espy-Wilson, Sandeep Manocha, Srikanth Vishnubhotla

Detection of a third speaker in telephone conversations
Uchechukwu O. Ofoegbu, Ananth N. Iyer, Robert E. Yantorno, Stanley J. Wenndt

Improvement speaker clustering using global similarity features
Konstantin Biatov, Joachim Köhler

Voting for two speaker segmentation
Balakrishnan Narayanaswamy, Rashmi Gangadharaiah, Richard M. Stern

Unsupervised model adaptation for speaker verification
Alexandre Preti, Jean-François Bonastre

A quality measure method using Gaussian mixture models and divergence measure for speaker identification
Rong Zheng, Shuwu Zhang, Bo Xu

Gammatone auditory filterbank and independent component analysis for speaker identification
Yushi Zhang, Waleed H. Abdulla

Study on speaker verification on emotional speech
Wei Wu, Thomas Fang Zheng, Ming-Xing Xu, Huan-Jun Bao

On the fusion of prosody, voice spectrum and face features for multimodal person verification
M. Farrús, A. Garde, P. Ejarque, J. Luque, Javier Hernando

An MRI based study of the acoustic effects of sinus cavities and its application to speaker recognition
Tarun Pruthi, Carol Y. Espy-Wilson

Speaker verification with non-audible murmur segments
Mariko Kojima, Tomoko Matsui, Hiromichi Kawanami, Hiroshi Saruwatari, Kiyohiro Shikano

Automatic recognition of speakers² age and gender on the basis of empirical studies
Christian Müller

Text-independent speaker identification in birds
E. J. S. Fox, J. D. Roberts, M. Bennamoun

Automatic acoustic identification of insects inspired by the speaker recognition paradigm
Ilyas Potamitis, Todor Ganchev, Nikos Fakotakis






Acoustic Signal Segmentation and Classification


Automatic English stop consonants classification using wavelet analysis and hidden Markov models
Marco Kühne, Roberto Togneri

Single frame selection for phoneme classification
Tingyao Wu, Dirk Van Compernolle, Jacques Duchateau, Hugo Van hamme

On the relation between maximum spectral transition positions and phone boundaries
Sorin Dusan, Lawrence Rabiner

Objective estimation of suicidal risk using vocal output characteristics
T. Yingthawornsuk, H. Kaymaz Keskinpala, D. France, D. M. Wilkes, R. G. Shiavi, R. M. Salomon

A wavelet-based parameterization for speech/music segmentation
E. Didiot, I. Illina, O. Mella, D. Fohr, Jean-Paul Haton

Distance measure between Gaussian distributions for discriminating speaking styles
Goshu Nagino, Makoto Shozakai

Bayesian networks for phonetic classification using time-scale features
Franz Pernkopf, Tuan Van Pham

Fast and effective retraining on contrastive vocal characteristics with bidirectional long short-term memory nets
Nicole Beringer

Exploiting dendritic autocorrelogram structure to identify spectro-temporal regions dominated by a single sound source
Ning Ma, Phil Green, André Coy

Locating phone boundaries from acoustic discontinuities using a two-staged approach
Pairote Leelaphattarakij, Proadpran Punyabukkana, Atiwong Suchato

Investigation on rescoring using minimum verification error (MVE) detectors
Qiang Fu, Biing-Hwang Juang

Generalization of the minimum classification error (MCE) training based on maximizing generalized posterior probability (GPP)
Qiang Fu, Antonio Moreno-Daniel, Biing-Hwang Juang, Jian-Lai Zhou, Frank K. Soong

Unsupervised detection of whispered speech in the presence of normal phonation
Michael A. Carlin, Brett Y. Smolenski, Stanley J. Wenndt

Friends and enemies: a novel initialization for speaker diarization
Xavier Anguera, Chuck Wooters, Javier Hernando


Linguistics, Phonology, and Phonetics I, II


Acoustic cues for the classification of regular and irregular phonation
Kushan Surana, Janet Slifka

Realizations and representations of Thai tones in monomoraic syllables
Rattima Nitisaroj

Measuring and comparing vowel qualities in a Dutch spontaneous speech corpus
Irene Jacobi, Louis C. W. Pols, Jan Stroop

Phonetic research on accented Chinese in three dialectal regions: Shanghai, Wuhan and Xiamen
Aijun Li, Qiang Fang, Ziyu Xiong

Pronunciation variation modeling for Mandarin with accent
Chi Zhang, Ji Wu, Xi Xiao, Zuoying Wang

Specificity and generalizability of spontaneous phonetic imitation
Kuniko Y. Nielsen

On the sufficiency of automatic phonetic transcriptions for pronunciation variation research
Christophe Van Bael, Hans van Halteren

Automatic detection of voice onset time contrasts for use in pronunciation assessment
Abe Kazemzadeh, Joseph Tepperman, Jorge Silva, Hong You, Sungbok Lee, Abeer Alwan, Shrikanth Narayanan

Unfilled pauses in Japanese sentences read aloud by non-native learners
Hiroko Hirano, Goh Kawai, Keikichi Hirose, Nobuaki Minematsu

Detection of quotations and inserted clauses and its application to dependency structure analysis in spontaneous Japanese
Ryoji Hamabe, Kiyotaka Uchimoto, Tatsuya Kawahara, Hitoshi Isahara

Chinese input method based on reduced Mandarin phonetic alphabet
Chun-Han Tseng, Chia-Ping Chen

Thesaurus expansion using similar word pairs from patent documents
Yoshimi Suzuki, Fumiyo Fukumoto

Low-resource autodiacritization of abjads for speech keyword search
Patrick Schone

A model of the regularities underlying speaker variation: evidence from hybrid synthesis
Susan R. Hertz

Pauses as a tool to ensure rhythmic wellformedness
Augustin Speyer

Factors affecting speakers² choice of fillers in Japanese presentations
Michiko Watanabe, Yasuharu Den, Keikichi Hirose, Shusaku Miwa, Nobuaki Minematsu

Developing consistent pronunciation models for phonemic variants
Marelie Davel, Etienne Barnard

Grapheme-to-phoneme conversion using automatically extracted associative rules for Korean TTS system
Jinsik Lee, Seungwon Kim, Gary Geunbae Lee

Example-based grapheme-to-phoneme conversion for Thai
Paisarn Charoenpornsawat, Tanja Schultz






Speech Perception I, II


An information theoretic tool for investigating speech perception
Bryce Lobdell, Jont B. Allen

An adaptive sampling procedure for speech perception experiments
Geoffrey Stewart Morrison

Disentangling gestural and auditory contrast accounts of compensation for coarticulation
Navin Viswanathan, James S. Magnuson, Carol A. Fowler

The role of positional probability in the segmentation of Cantonese speech
Michael C. W. Yip

Nasality perception of vowels in different language background
Shahina Haque, Tomio Takara

Steady-state suppression in reverberation: a comparison of native and nonnative speech perception
Nao Hodoshima, Dawn Behne, Takayuki Arai

Effect of dynamic information of formants on discrimination of English vowels in consonantal contexts by Japanese listeners
Akiyo Joto

Native and nonnative audio-visual perception of English fricatives in quiet and cafe-noise backgrounds
Yue Wang, Dawn Behne, Haisheng Jiang, Chad Danyluck

Perceptive and acoustic measurement of average speaking pitch of female and male speakers in German radio news
Sven Grawunder, Ines Bose, Birgit Hertha, Franziska Trauselt, Lutz Christian Anders

Effects of frequency shifts on perceived naturalness and gender information in speech
Peter F. Assmann, Sophia Dembling, Terrance M. Nearey

Influence of pause length on listeners² impressions in simultaneous interpretation
Hitomi Tohyama, Shigeki Matsubara

New measures to chart toddlers² speech perception and language development: a test of the lexical restructuring hypothesis
Iris-Corinna Schwarz, Denis Burnham

Perception of fundamental frequency in cochlear implant patients
Ángel de la Torre, Cristina Roldán, Manuel Sainz

Effects of featural similarity and overlap position on lexical confusions and overt similarity judgments
Sarah C. Creel, Delphine Dahan, Daniel Swingley

Word structure and tone perception in Mandarin
Hansjörg Mixdorff, Yu Hu

Identification of regional accents in French: perception and categorization
Cecile Woehrling, Philippe Boula de Mareüil

Consonant and vowel confusions in speech-weighted noise
Sandeep Phatak, Jont B. Allen

Accident - execute: increased activation in nonnative listening
Mirjam Broersma

Estimation of the quality dimension "directness/frequency content" for the instrumental assessment of speech quality
Kirstin Scholz, Marcel Waltermann, Lu Huo, Alexander Raake, Sebastian Möller, Ulrich Heute


Speech Production, Physiology, and Pathology I, II


Effects of word frequency on the acoustic durations of affixes
Mark Pluymaekers, Mirjam Ernestus, R. Harald Baayen

A noninvasive, low-cost device to study the velopharyngeal port during speech and some preliminary results
Xiaochuan Niu, Alexander B. Kain, Jan P. H. van Santen

Characterization of cued speech vowels from the inner lip contour
Noureddine Aboutabit, Denis Beautemps, Laurent Besacier

Modelling aspiration noise during phonation using the LF voice source model
Christer Gobl

A simulation based parameter optimization for a coarticulation model
Jianguo Wei, Xugang Lu, Jianwu Dang

Multivariate analysis of frame-based acoustic cues of dysperiodicities in connected speech
A. Kacha, Francis Grenez, Jean Schoentgen

Effects of midline tongue piercing on spectral centroid frequencies of sibilants
Tom Kovacs, Donald S. Finan

Assessment of articulatory sub-systems of dysarthric speech using an isolated-style phoneme recognition system
P. Vijayalakshmi, M. R. Reddy, Douglas O’Shaughnessy

Respiratory/laryngeal interactions during sustained vowel production in children
Donald S. Finan, Carol A. Boliek

Acoustic characterization of children with speech delay
H. Timothy Bunnell, James B. Polikoff

Study of time and frequency variability in pathological speech and error reduction methods for automatic speech recognition
Oscar Saz, Antonio Miguel, Eduardo Lleida, Alfonso Ortega, Luis Buera

Voice source correlates of prosodic features in american English: a pilot study
Markus Iseli, Yen-Liang Shue, Melissa A. Epstein, Patricia Keating, Jody Kreiman, Abeer Alwan

On speech variation and word type differentiation by articulatory feature representations
Louis ten Bosch, R. Harald Baayen, Mirjam Ernestus

A study of emotional speech articulation using a fast magnetic resonance imaging technique
Sungbok Lee, Erik Bresch, Jason Adams, Abe Kazemzadeh, Shrikanth Narayanan

Reconstructing tongue movements from audio and video
Hedvig Kjellström, Olov Engwall, Olle Bälter

New considerations for vowel nasalization based on separate mouth-nose recording
Gang Feng, Cyril Kotenkoff

An acoustic and articulatory study of Lombard speech: global effects on the utterance
Maeva Garnier, Lucie Bailly, Marion Dohen, Pauline Welby, Helene Loevenbruck




Robustness and Adaptation for ASR


An integrated solution for error concealment in DSR systems over wireless channels
Antonio M. Peinado, Angel M. Gómez, Victoria Sánchez, José L. Pérez-Córdoba, Antonio J. Rubio

Interleaving and MMSE estimation with VQ replicas for distributed speech recognition over lossy packet networks
Angel M. Gómez, Antonio M. Peinado, Victoria Sánchez, José L. Carmona, Antonio J. Rubio

Noise-robust speech recognition of conversational telephone speech
Gang Chen, Hesham Tolba, Douglas O’Shaughnessy

Lost speech reconstruction method using speech recognition based on missing feature theory and HMM-based speech synthesis
Shingo Kuroiwa, Satoru Tsuge, Fuji Ren

Speaker adaptation using evolutionary-based linear transform
Sid-Ahmed Selouani, Douglas O’Shaughnessy

A speaker adaptation algorithm using principal curves in noisy environments
Jingying Wang, Zuoying Wang

Limitations of MLLR adaptation with Spanish-accented English: an error analysis
Constance Clarke, Daniel Jurafsky

Issues with uncertainty decoding for noise robust speech recognition
H. Liao, M. J. F. Gales

Vector taylor series based joint uncertainty decoding
Haitian Xu, Luca Rigazio, David Kryze

A maximum likelihood training approach to irrelevant variability compensation based on piecewise linear transformations
Qiang Huo, Donglai Zhu

Speaker clustered regression-class trees for MLLR adaptation
Arindam Mandal, Mari Ostendorf, Andreas Stolcke

Robust speech recognition over mobile networks using combined weighted viterbi decoding and subvector based error concealment
Zheng-Hua Tan, Paul Dalsgaard, Børge Lindberg

Speaker adaptation of trajectory HMMs using feature-space MLLR
Heiga Zen, Yoshihiko Nankaku, Keiichi Tokuda, Tadashi Kitamura

Feature and model space speaker adaptation with full covariance Gaussians
Daniel Povey, George Saon


Multimodal, Translation and Information Retrieval


Linguistic tuple segmentation in n-gram-based statistical machine translation
Adrià de Gispert, José B. Mariño

Sentence boundary detection using sequential dependency analysis combined with CRF-based chunking
Takanobu Oba, Takaaki Hori, Atsushi Nakamura

Sequence classification for machine translation
Srinivas Bangalore, Patrick Haffner, Stephan Kanthak

Two-stage vocabulary-free spoken document retrieval - subword identification and re-recognition of the identified sections
Yoshiaki Itoh, Takayuki Otake, Kohei Iwata, Kazunori Kojima, Masaaki Ishigame, Kazuyo Tanaka, Shi-wook Lee

Design and performance analysis of a factoid question answering system for spontaneous speech transcriptions
Mihai Surdeanu, David Dominguez-Sal, Pere R. Comas

Performance improvement of dialog speech translation by rejecting unreliable utterances
Toshiyuki Takezawa, Tohru Shimizu

Cross-lingual dialog model for speech to speech translation
Emil Ettelaie, Panayiotis G. Georgiou, Shrikanth Narayanan

A robust fusion method for multilingual spoken document retrieval systems employing tiered resources
Murat Akbacak, John H. L. Hansen

Recent advances of IBM’s handheld speech translation system
Weizhong Zhu, Bowen Zhou, Charles Prosser, Pavel Krbec, Yuqing Gao

QASR: question answering using semantic roles for speech interface
Svetlana Stenchikova, Dilek Hakkani-Tür, Gokhan Tur

Towards a multimodal topic tracking system for a mobile robot
Jan F. Maas, Britta Wrede, Gerhard Sagerer

Edge-splitting in a cumulative multimodal system, for a no-wait temporal threshold on information fusion, combined with an under-specified display
Edward C. Kaiser, Paulo Barthelmess

Joint interpretation of input speech and pen gestures for multimodal human-computer interaction
Pui-Yu Hui, Helen M. Meng





Text-to-Speech I, II


Expressive prosody for unit-selection speech synthesis
Volker Strom, Robert A. J. Clark, Simon King

Cues for hesitation in speech synthesis
Rolf Carlson, Kjell Gustafson, Eva Strangert

Multi-domain text-to-speech synthesis by automatic text classification
Francesc Alías, Joan Claudi Socoró, Xavier Sevillano, Ignasi Iriondo, Xavier Gonzalvo

Phrase break prediction using logistic generalized linear model
Lifu Yi, Jian Li, Xiaoyan Lou, Jie Hao

Joint prosodic and segmental unit selection speech synthesis
Robert A. J. Clark, Simon King

Phonetically enriched labeling in unit selection TTS synthesis
Yeon-Jun Kim, Ann K. Syrdal, Alistair Conkie, Mark C. Beutnagel

Further developments in LSM-based boundary training for unit selection TTS
Jerome R. Bellegarda

A style control technique for speech synthesis using multiple regression HSMM
Takashi Nose, Junichi Yamagishi, Takao Kobayashi

Acoustic model training based on linear transformation and MAP modification for HSMM-based speech synthesis
Katsumi Ogata, Makoto Tachibana, Junichi Yamagishi, Takao Kobayashi

Improving Arabic HMM based speech synthesis quality
Ossama Abdel-Hamid, Sherif Mahdy Abdou, Mohsen Rashwan

Farsbayan: a unit selection based Farsi speech synthesizer
M. Mehdi Homayounpour, Majid Namnabat

Amharic speech synthesis using cepstral method with stress generation rule
Tadesse Anberbir, Tomio Takara

Automatic syllable-pattern induction in statistical Thai text-to-phone transcription
Ausdang Thangthai, Chatchawarn Hansakunbuntheung, Rungkarn Siricharoenchai, Chai Wutiwiwatchai

Development of prototype text-to-speech systems for northern sotho
H. J. Oosthuizen, S. T. Phihlela, M. J. D. Manamela

Identify language origin of personal names with normalized appearance number of web pages
Jiali You, Yining Chen, Min Chu, Yong Zhao, Jinlin Wang

Conditional random fields for hierarchical segment selection in text-to-speech synthesis
Christian Weiss, Wolfgang Hess

Corpus design based on the kullback-leibler divergence for text-to-speech synthesis application
Aleksandra Krul, Géraldine Damnati, François Yvon, Thierry Moudenc

HMM-based unit selection using frame sized speech segments
Zhen-Hua Ling, Ren-Hua Wang

The target cost formulation in unit selection speech synthesis
Paul Taylor

Unit selection and its relation to symbolic prosody: a new approach
Daniel Tihelka, Jindrich Matousek

Minimum generation error criterion for tree-based clustering of context dependent HMMs
Yi-Jian Wu, Wu Guo, Ren-Hua Wang

Selective-LPC based representation of STRAIGHT spectrum and its applications in spectral smoothing
Heng Kang, Wenju Liu

Towards a comprehensive investigation of factors relevant to peak alignment using a unit selection corpus
Matthias Jilka, Bernd Möbius

Six approaches to limited domain concatenative speech synthesis
Robert J. Utama, Ann K. Syrdal, Alistair Conkie

From pre-recorded prompts to corporate voices: on the migration of interactive voice response applications
V. Fischer, S. Kunzmann

Automatic speech segmentation with multiple statistical models
Seung Seop Park, Jong Won Shin, Nam Soo Kim

Evaluation of perceptual quality of control point reduction in rule-based synthesis
Kimmo Pärssinen, Marko Moberg

Segment connection networks for corpus-based speech synthesis
Geert Coorman


Special Populations - Learners, Aged, Challenged


Observations of the spoken language acquisition process based on a multimodal infant behavior corpus
Ryo Tsuji, Tomohiko Kasami, Shogo Ishikawa, Shinya Kiriyama, Yoichi Takebayashi, Shigeyoshi Kitazawa

Infants² ability to extract verbs from continuous speech
Ellen Marklund, Francisco Lacerda

Category formation and the role of spectral quality in the perception and production of English front vowels
Ricardo A.H. Bion, Paola Escudero, Andréia S. Rauber, Barbara O. Baptista

Productions in bilinguism, early foreign language learning and monolinguism: a prosodic comparison
Ranka Bijeljac-Babic, Christelle Dodane, Sabine Metta, Claire Gérard

Training native English speakers to identify Japanese vowel length with fast rate sentences
Yukari Hirata, Elizabeth Whitehurst, Emily Cullings, Jacob Whiton, Carol Glenn

Formant-based English vowel assessment for Chinese in Taiwan
Jiang-Chun Chen, Wei-Tang Hsu, J.-S. Roger Jang, Ren-Yuan Lyu, Yuang-Chin Chiang

Substitute sounds for ventriloquism and speech disorders
Jörg Metzner, Marcel Schmittfull, Karl Schnell

Automatic Mandarin pronunciation scoring for native learners with dialect accent
Si Wei, Qing-Sheng Liu, Yu Hu, Ren-Hua Wang

Quick individual fitting methods of simplified hearing compensation for elderly people
Kengo Fujita, Tsuneo Kato, Hisashi Kawai

An online adaptive filtering algorithm for the vocal joystick
Xiao Li, Jonathan Malkin, Susumu Harada, Jeff A. Bilmes, Richard Wright, James Landay

Speaking aid system for total laryngectomees using voice conversion of body transmitted artificial speech
Keigo Nakamura, Tomoki Toda, Hiroshi Saruwatari, Kiyohiro Shikano

A Spanish speech to sign language translation system for assisting deaf-mute people
R. San-Segundo, R. Barra, L. F. D’Haro, J. M. Montero, R. Córdoba, J. Ferreiros

Potential relevance of audio-visual integration in mammals for computational modeling
Eeva Klintfors, Francisco Lacerda

Finding the gaps: applying a connectionist model of word segmentation to noisy phone-recognized speech data
C. Anton Rytting




Acoustic Modeling IV


Manifold HLDA and its application to robust speech recognition
Toshiaki Kubo, Tetsuji Ogawa, Tetsunori Kobayashi

Time-dependent cross-probability model for multi-environment model based LInear normalization
Luis Buera, Eduardo Lleida, Juan A. Nolazco-Flores, Antonio Miguel, Alfonso Ortega

SPAM and full covariance for speech recognition
Daniel Povey

The use of Bayesian network for incorporating accent, gender and wide-context dependency information
Sakriani Sakti, Konstantin Markov, Satoshi Nakamura

Integrating phonetic boundary discrimination explicitly into HMM systems
Yu Wang, Eric Fosler-Lussier

Robust acoustic-based syllable detection
Zhimin Xie, Partha Niyogi

A tone recognition framework for continuous Mandarin speech
Lei He, Jie Hao

Pronunciation variant-based multi-path HMMs for syllables
Annika Hämäläinen, Louis ten Bosch, Lou Boves

A new state-dependent phonetic tied-mixture model with head-body-tail structured HMM for real-time continuous phoneme recognition system
Junho Park, Hanseok Ko

Conversion from phoneme based to grapheme based acoustic models for speech recognition
Andrej Zgank, Zdravko Kacic

Phone vector DHMM to decode a phone recognizer's output
Bong-Wan Kim, Dae-Lim Choi, Yongnam Um, Yong-Ju Lee

Combining multiple-sized sub-word units in a speech recognition system using baseform selection
T. Nagarajan, P. Vijayalakshmi, Douglas O'Shaughnessy

Local transformation models for speech recognition
Antonio Miguel, Eduardo Lleida, Alfons Juan, Luis Buera, Alfonso Ortega, Oscar Saz


Large Vocabulary Speech Recognition


Online speech detection and dual-gender speech recognition for captioning broadcast news
Toru Imai, Shoei Sato, Akio Kobayashi, Kazuo Onoe, Shinichi Homma

Automatic alignment and error correction of human generated transcripts for long speech recordings
Timothy J. Hazen

Improving speech recognition accuracy with multi-confidence thresholding
Shuangyu Chang

Conceptual decoding from word lattices: application to the spoken dialogue corpus MEDIA
Christophe Servan, Christian Raymond, Frédéric Béchet, Pascal Nocéra

Improving the performance of out-of-vocabulary word rejection by using support vector machines
Shilei Huang, Xiang Xie, Jingming Kuang

Robust phone lattice decoding
Kris Demuynck, Dirk Van Compernolle, Hugo Van hamme

Imperfect transcript driven speech recognition
Benjamin Lecouteux, Georges Linarès, Pascal Nocéra, Jean-François Bonastre

New improvements in decoding speed and latency for automatic captioning
Jian Xue, Rusheng Hu, Yunxin Zhao

Colloquial Iraqi ASR for speech translation
Shirin Saleem, Rohit Prasad, Prem Natarajan

Reducing computation on parallel decoding using frame-wise confidence scores
Tomohiro Hakamata, Akinobu Lee, Yoshihiko Nankaku, Keiichi Tokuda

Posterior based keyword spotting with a priori thresholds
Hamed Ketabdar, Jithendra Vepa, Samy Bengio, Hervé Bourlard

A multi-pass error detection and correction framework for Mandarin LVCSR
Zhengyu Zhou, Helen M. Meng, Wai Kit Lo

Continual on-line monitoring of Czech spoken broadcast programs
Jan Nouza, Jindrich Zdansky, Petr Cerva, Jan Kolorenc







Modeling Speaker Emotional State


Synthesizing breathiness in natural speech with sinusoidal modelling
Brett Matthews, Raimo Bakis, Ellen Eide

Voice GMM modelling for FESTIVAL/MBROLA emotive TTS synthesis
Mauro Nicolao, Carlo Drioli, Piero Cosi

Emovoice: a system to generate emotions in speech
João P. Cabral, Luís C. Oliveira

Real-time synthesis of Chinese visual speech and facial expressions using MPEG-4 FAP features in a three-dimensional avatar
Zhiyong Wu, Shen Zhang, Lianhong Cai, Helen M. Meng

Modeling the acoustic correlates of expressive elements in text genres for expressive text-to-speech synthesis
Hongwu Yang, Helen M. Meng, Lianhong Cai

Automatic emotion recognition of speech signal in Mandarin
Sheng Zhang, P. C. Ching, Fanrang Kong

Feature analysis for emotion recognition from Mandarin speech considering the special characteristics of Chinese language
Yi-hao Kao, Lin-shan Lee

Timing levels in segment-based speech emotion recognition
Björn Schuller, Gerhard Rigoll

Analyzing dialogue data for real-world emotional speech classification
Ryuichi Nisimura, Souji Omae, Hideki Kawahara, Toshio Irino

Evolving emotional prosody
Cecilia Ovesdotter Alm, Xavier Llorà

Vocal emotion recognition with cochlear implants
Xin Luo, Qian-Jie Fu, John J. Galvin III

Emotion detection in infants² cries based on a maximum likelihood approach
S. Matsunaga, S. Sakaguchi, M. Yamashita, S. Miyahara, S. Nishitani, K. Shinohara

yeah right: sarcasm recognition for spoken dialogue systems
Joseph Tepperman, David Traum, Shrikanth Narayanan

Identification of confusion and surprise in spoken dialog using prosodic features
Rohit Kumar, Carolyn P. Rosé, Diane J. Litman

Analysis and detection of speech under sleep deprivation
Tin Lay Nwe, Haizhou Li, Minghui Dong

Language, gender, speaking style and language proficiency as factors influencing the autonomous vocalic filler production in spontaneous speech
Ioana Vasilescu, Martine Adda-Decker



Spoken Language Understanding


A spoken language understanding approach using successive learners
Wei-Lin Wu, Ru-Zhan Lu, Hui Liu, Feng Gao

Conversational help desk: vague callers and context switch
Osamuyimen Stewart, Juan Huerta, Ea-Ee Jan, Cheng Wu, Xiang Li, David Lubensky

Integrating spoken dialog and question answering: the ritel project
Sophie Rosset, Olivier Galibert, Gabriel Illouz, Aurélien Max

Rapid simulation-driven reinforcement learning of multimodal dialog strategies in human-robot interaction
Thomas Prommer, Hartwig Holzapfel, Alex Waibel

Software architectures for incremental understanding of human speech
Gregory Aist, James Allen, Ellen Campana, Lucian Galescu, Carlos A. Gómez Gallo, Scott C. Stoness, Mary Swift, Michael Tanenhaus

Lingua machinae - an unorthodox proposal
Florian Schiel, Christoph Draxler, Marion Libossek

Evaluation of content presentation strategies for an in-car spoken dialogue system
Heather Pon-Barry, Fuliang Weng, Sebastian Varges

On designing context sensitive language models for spoken dialog systems
Vaibhava Goel, Ramesh Gopinath

Using SVM and error-correcting codes for multiclass dialog act classification in meeting corpus
Yang Liu

A multilingual expectations model for contextual utterances in mixed-initiative spoken dialogue
Hartwig Holzapfel, Alex Waibel

Dynamic help generation by estimating user²s mental model in spoken dialogue systems
Yuichiro Fukubayashi, Kazunori Komatani, Tetsuya Ogata, Hiroshi G. Okuno

Dialog act tagging with support vector machines and hidden Markov models
Dinoj Surendran, Gina-Anne Levow





Multichannel Speech Enhancement/Speech Perception


Improved hybrid microphone array post-filter by integrating a robust speech absence probability estimator for speech enhancement
Junfeng Li, Masato Akagi, Yôiti Suzuki

Soft decision combining for dual channel noise reduction
Timo Gerkmann, Rainer Martin

An improved affine projection algorithm based crosstalk resistant adaptive noise canceller
Guo Chen, Vijay Parsa

An optimum microphone array post-filter for speech applications
Stamatis Leukimmiatis, Dimitrios Dimitriadis, Petros Maragos

Multi-microphone periodicity function for robust F0 estimation in real noisy and reverberant environments
Federico Flego, Maurizio Omologo

A new dual-microphone speech enhancement method for oriented noises
H. R. Abutalebi, M. Pourahmadi, M.R. Aghabozorgi

50 years late: repeating miller-nicely 1955
Andrew Lovitt, Jont B. Allen

New 20-word lists for word intelligibility test in Japanese
Shuichi Sakamoto, Tadahiro Yoshikawa, Shigeaki Amano, Yôiti Suzuki, Tadahisa Kondo

Sparseness and speech perception in noise
Guoping Li, Mark E. Lutman

An assessment of automatic speech recognition as speech intelligibility estimation in the context of additive noise
Wei M. Liu, John S. D. Mason, Nicholas W. D. Evans, Keith A. Jellyman

Underlying quality dimensions of modern telephone connections
Marcel Wältermann, Kirstin Scholz, Alexander Raake, Ulrich Heute, Sebastian Möller

An ERB loudness pattern based objective speech quality measure
Guo Chen, Vijay Parsa, Susan Scollie




Voice Morphing


Improving the performance of HMM-based voice conversion using context clustering decision tree and appropriate regression matrix format
Long Qin, Yi-Jian Wu, Zhen-Hua Ling, Ren-Hua Wang

Map-based adaptation for speech conversion using adaptation data selection and non-parallel training
Chung-Han Lee, Chung-Hsien Wu

Novel method for data clustering and mode selection with application in voice conversion
Jani Nurminen, Jilei Tian, Victor Popa

Text-independent cross-language voice conversion
David Sündermann, Harald Höge, Antonio Bonafonte, Hermann Ney, Julia Hirschberg

Maximum likelihood voice conversion based on GMM with STRAIGHT mixed excitation
Yamato Ohtani, Tomoki Toda, Hiroshi Saruwatari, Kiyohiro Shikano

Improving body transmitted unvoiced speech with statistical voice conversion
Mikihiro Nakagiri, Tomoki Toda, Hideki Kashioka, Kiyohiro Shikano

An HMM-based singing voice synthesis system
Keijiro Saino, Heiga Zen, Yoshihiko Nankaku, Akinobu Lee, Keiichi Tokuda

Voice conversion based on mixtures of factor analyzers
Yosuke Uto, Yoshihiko Nankaku, Tomoki Toda, Akinobu Lee, Keiichi Tokuda

Efficient Gaussian mixture model evaluation in voice conversion
Jilei Tian, Jani Nurminen, Victor Popa

Constrained structural maximum a posteriori linear regression for average-voice-based speech synthesis
Yuji Nakano, Makoto Tachibana, Junichi Yamagishi, Takao Kobayashi

Frequency warping based on mapping formant parameters
Zhi-Wei Shuang, Raimo Bakis, Slava Shechtman, Dan Chazan, Yong Qin

Automatic phonetic segmentation by using a SPM-based approach for a Mandarin singing voice corpus
Cheng-Yuan Lin, J.-S. Roger Jang

A comparison of singing evaluation algorithms
Partha Lal







Advances in Noisy ASR


Exploiting polynomial-fit histogram equalization and temporal average for robust speech recognition
Shih-Hsiang Lin, Yao-Ming Yeh, Berlin Chen

Missing data mask models with global frequency and temporal constraints
Sébastien Demange, Christophe Cerisara, Jean-Paul Haton

Multi-stream ASR: an oracle perspective
Hemant Misra, Jithendra Vepa, Hervé Bourlard

A weight estimation method using LDA for multi-band speech recognition
Koji Iwano, Kaname Kojima, Sadaoki Furui

Powered cepstral normalization (p-CN) for robust features in speech recognition
Chang-wen Hsu, Lin-shan Lee

Robust automatic speech recognition for accented Mandarin in car environments
Pei Ding, Lei He, Xiang Yan, Jie Hao

A robust feature extraction based on the MTF concept for speech recognition in reverberant environment
Xugang Lu, Masashi Unoki, Masato Akagi

Clean speech feature estimation based on soft spectral masking
Young Joon Kim, Woohyung Lim, Nam Soo Kim

Robust speech recognition by modifying clean and telephone feature vectors using bidirectional neural network
Mansoor Vali, Seyyed Ali Seyyed Salehi, Kazem Karimi

Silence energy normalization for robust speech recognition in additive noise environment
Chung-fu Tai, Jeih-weih Hung

Handling convolutional noise in missing data automatic speech recognition
Maarten Van Segbroeck, Hugo Van hamme

Noisy speech recognition based on selection of multiple noise suppression methods using noise GMMs
Norihide Kitaoka, Souta Hamaguchi, Seiichi Nakagawa

Using posterior-based features in template matching for speech recognition
Guillermo Aradilla, Jithendra Vepa, Hervé Bourlard

Hypothesis-based feature combination of multiple speech inputs for robust speech recognition in automotive environments
Yasunari Obuchi, Nobuo Hataoka


Source Separation and Localization


Continuous time-frequency masking method for blind speech separation with adaptive choice of threshold parameter using ICA
Zbynek Koldovsky, Jan Nouza, Jan Kolorenc

Multistage convolutive blind source separation for speech mixture
Yanxue Liang, Ichiro Hagiwara

Detection and separation of speech events in meeting recordings
Futoshi Asano, Jun Ogata

Audio person tracking in a smart-room environment
Alberto Abad, Carlos Segura, Duàn Macho, Javier Hernando, Climent Nadeu

Tracking and beamforming for multiple simultaneous speakers with probabilistic data association filters
Tobias Gehrig, Ulrich Klee, John W. McDonough, Shajith Ikbal, Matthias Wölfel, Christian Fügen

Modeling the precedence effect for binaural sound source localization in noisy and echoic environments
Martin Heckmann, Tobias Rodemann, Bjorn Scholling, Frank Joublin, Christian Goerick

Using a differential microphone array to estimate the direction of arrival of two acoustic sources
Fotios Talantzis, Anthony G. Constantinides, Lazaros C. Polymenakos

Speaker localization based on oriented global coherence field
Alessio Brutti, Maurizio Omologo, Piergiorgio Svaizer

Performance evaluation of three features for model-based single channel speech separation problem
M. H. Radfar, R. M. Dansereau, A. Sayadiyan

Single-channel speech separation using sparse non-negative matrix factorization
Mikkel N. Schmidt, Rasmus K. Olsson

Adaptive speech enhancement for speech separation in diffuse noise
Rong Hu, Yunxin Zhao

A probabilistic graphical model for microphone array source separation using rich pre-trained source models
H. T. Attias

Geometrically constrained permutation-free source separation in an undercomplete speech unmixing scenario
Erik Visser

Highly directional multi-beam audio loudspeaker
Dirk Olszewski, Klaus Linhard


×

Language Modeling for Spoken Dialog Systems

Feature Enhancement for Robust ASR

Dialog and Discourse

The Speech Separation Challenge

Multilingual and Multi-Accent Processing

Corpora, Annotation, and Assessment Metrics I, II

Speech Coding

Speech Enhancement I, II

ASR Other I, II

Modeling Prosodic Features

Spoken Information Retrieval

Front-End Methods for ASR

Language and Dialect Recognition

Spoken Dialog Systems I, II

Speaker Characterization and Recognition I-IV

System Combination

Interpreting Prosodic Variation

Articulatory Modeling

Acoustic Modeling I - Training and Topologies

Acoustic Signal Segmentation and Classification

Linguistics, Phonology, and Phonetics I, II

Speech Translation

Acoustic Modeling II - Adaptation

Emotional Speech and Speaker State

Speech and Language in Education

Speech Perception I, II

Speech Production, Physiology, and Pathology I, II

Formant Estimation

Language Processing Beyond and Below the Word-Level

Robustness and Adaptation for ASR

Multimodal, Translation and Information Retrieval

Advances in Acoustic Segmentation

Acoustic Modeling III - LVCSR

Speech and Visual Processing

Text-to-Speech I, II

Special Populations - Learners, Aged, Challenged

Robust ASR

Speech Summarization

Acoustic Modeling IV

Large Vocabulary Speech Recognition

Speech/Noise/Music Segmentation

Pitch Estimation

Acoustic Modeling V - Novel Approaches

Corpus-Based Synthesis

Spoken Dialog Technology R&D

Modeling Speaker Emotional State

Language Modeling and ASR Applications

Spoken Language Understanding

Segmentation and VAD

Technologies for Specific Populations: Learners and Challenged

The Prosody of Turn-Taking and Dialog Acts

Multichannel Speech Enhancement/Speech Perception

Diarization in ASR

Language Model Adaptation, Refinement, and Evaluation

Voice Morphing

Prosody

Discriminative Training

Speech Synthesis

Multimodal Processing

Speech Analysis

Advances in Noisy ASR

Source Separation and Localization