Speech Recognition in Noise - I

Evaluation of a noise-robust DSR front-end on Aurora databases
Duncan Macho, Laurent Mauuary, Bernhard Noé, Yan Ming Cheng, Doug Ealey, Denis Jouvet, Holly Kelleher, David Pearce, Fabien Saadoun

Qualcomm-ICSI-OGI features for ASR
Andre Adami, Lukás Burget, Stephane Dupont, Hari Garudadri, Frantisek Grezl, Hynek Hermansky, Pratibha Jain, Sachin Kajarekar, Nelson Morgan, Sunil Sivadas

Improving word accuracy with Gabor feature extraction
Michael Kleinschmidt, David Gelbart

Evaluation of SPLICE on the Aurora 2 and 3 tasks
Jasha Droppo, Li Deng, Alex Acero

Performance of discriminatively trained auditory features on Aurora2 and Aurora3
Brian Mak, Yik-Cheung Tam

Feature extraction combining spectral noise reduction and cepstral histogram equalization for robust ASR
José C. Segura, M.C. Benítez, Ángel de la Torre, Antonio J. Rubio

Bell labs approach to Aurora evaluation on connected digit recognition
Jingdong Chen, Dimitris Dimitriadis, Hui Jiang, Qi Li, Tor André Myrvoll, Olivier Siohan, Frank K. Soong

Algorithms for distributed speech recognition in a noisy automobile environment
Hong Kook Kim, Richard C. Rose

Quantile based histogram equalization for online applications
Florian Hilger, Sirko Molau, Hermann Ney

Frontend post-processing and backend model enhancement on the Aurora 2.0/3.0 databases
Chia-Ping Chen, Karim Filali, Jeff A. Bilmes

HMM COmposition-based rapid model adaptation using a priori noise GMM adaptation evaluation on Aurora2 corpus
Masaki Ida, Satoshi Nakamura

Data-driven temporal filters obtained via different optimization criteria evaluated on Aurora2 database
Jeih-weih Hung, Lin-shan Lee

Efficient additive and convolutional noise reduction procedures
Bojan Kotnik, Damjan Vlaj, Zdravko Kacic, Bogomir Horvat

Progress with the philips continuous ASR system on the Aurora 2 noisy digits database
Markus Lieb, Alexander Fischer

An environment compensated minimum classification error training approach and its evaluation on Aurora2 database
Jian Wu, Qiang Huo

Evaluation of a noise adaptive speech recognition system on the Aurora 3 database
Kaisheng Yao, Dong-Lai Zhu, Satoshi Nakamura

Distributed speech recognition over IP networks on the Aurora 3 database
Laura Docío-Ferández, Carmen García-Mateo

Evaluation of noisy speech recognition based on noise reduction and acoustic model adaptation on the Aurora2 tasks
M. Fujimoto, Yasuo Ariki

Improvements to the IBM Aurora 2 multi-condition system
George Saon, Juan M. Huerta

Distributed speech recognition using noise-robust MFCC and traps-estimated manner features
Pratibha Jain, Hynek Hermansky, Brian Kingsbury

Evaluation of spectral subtraction with smoothing of time direction on the Aurora 2 task
Norihide Kitaoka, Seiichi Nakagawa

Evaluation of noise robust features on the Aurora databases
Xiaodong Cui, Markus Iseli, Qifeng Zhu, Abeer Alwan

Computationally efficient noise compensation for robust automatic speech recognition assessed under the Aurora 2/3 framework
Nicholas W. D. Evans, John S. Mason

Mel-scaled wavelet filter based features for noisy unvoiced phoneme recognition
O. Farooq, S. Datta

Filter bank subtraction for robust speech recognition
Kazuo Onoe, Hiroyuki Segi, Takeshi Kobayakawa, Shoei Sato, Toru Imai, Akio Ando

Low cost duration modelling for noise robust speech recognition
Andrew C. Morris, Simon Payne, Hervé Bourlard

A comparative study of approximations for parallel model combination of static and dynamic parameters
Yifan Gong

Noise estimation for efficient speech enhancement and robust speech recognition
Petr Motícek, Lukás Burget

The 2001 GMTK-based SPINE ASR system
Özgür Çetin, Harriet J. Nock, Katrin Kirchhoff, Jeff A. Bilmes, Mari Ostendorf

Using adaptive signal limiter together with weighting techniques for noisy speech recognition
Wei-Wen Hung

Spectral subtraction in noisy environments applied to speaker adaptation based on HMM sufficient statistics
Shingo Yamade, Kanako Matsunami, Akira Baba, Akinobu Lee, Hiroshi Saruwatari, Kiyohiro Shikano

Robust speech recognition against short-time noise
Manhung Siu, Yu-Chung Chan

Word endpoints detection in the presence of non-stationary noise
M. Toma, A. Lodi, R. Guerrieri

Comparison and combination of RASTA-PLP and FF features in a hybrid HMM/MLP speech recognition system
Pere Pujol Marsal, Susagna Pol Font, Astrid Hagen, Hervé Bourlard, Climent Nadeu

Robust MMSE-FW-LAASR scheme at low SNRs
Tao Xu, Zhigang Cao

Robust speech recognition using a voiced-unvoiced feature
András Zolnay, Ralf Schlüter, Hermann Ney

Accumulated kullback divergence for analysis of ASR performance in the presence of noise
Febe de Wet, Johan de Veth, Bert Cranen, Lou Boves

A hybrid HMM/traps model for robust voice activity detection
Brian Kingsbury, Pratibha Jain, Andre Adami

Run time information fusion in speech recognition
Chengyi Zheng, Yonghong Yan

Using observation uncertainty in HMM decoding
Jon A. Arrowood, Mark A. Clements

Combining a Gaussian mixture model front end with MFCC parameters
M. N. Stuttle, M. J. F. Gales

Noise from corrupted speech log mel-spectral energies
Jasha Droppo, Alex Acero, Li Deng

Improving the role of unvoiced speech segments by spectral normalisation in robust speech recognition
Carlos Lima, Luís B. Almeida, João L. Monteiro

Building an ASR system for noisy environments: SRI’s 2001 SPINE evaluation system
Venkata Ramana Rao Gadde, Andreas Stolcke, Dimitra Vergyri, Jing Zheng, Kemal Sönmez, Anand Venkataraman

Speech Recognition: Adaptation

Maximum likelihood estimation of eigenvoices and residual variances for large vocabulary speech recognition tasks
P. Kenny, G. Boulianne, Pierre Dumouchel

Rapid speaker adaptation using speaker clustering
Ernest J. Pusateri, Timothy J. Hazen

Adaptive model combination for dynamic speaker selection training
Chao Huang, Tao Chen, Eric Chang

Unsupervised n-best based model adaptation using model-level confidence measures
Ka-Yan Kwan, Tan Lee, Chen Yang

LU factorization for feature transformation
Patrick Nguyen, Luca Rigazio, Christian Wellekens, Jean-Claude Junqua

Implementing vocal tract length normalization in the MLLR framework
Guo-Hong Ding, Yi-Fei Zhu, Chengrong Li, Bo Xu

Markov models based on speaker space model evolution
Dong Kook Kim, Nam Soo Kim

Robust speech recognition using inter-speaker and intra-speaker adaptation
Baojie Li, Keikichi Hirose, Nobuaki Minematsu

Continuous environmental adaptation of a speech recogniser in telephone line conditions
Carlos Lima, Luís B. Almeida, João L. Monteiro

Tree-structured maximum a posteriori adaptation for a segment-based speech recognition system
Irina Illina

Robust time-synchronous environmental adaptation for continuous speech recognition systems
Thomas Plötz, Gernot A. Fink

Unsupervised language model adaptation for lecture speech transcription
Thomas Niesler, Daniel Willett

Incremental on-line feature space MLLR adaptation for telephony speech recognition
Yongxin Li, Hakan Erdogan, Yuqing Gao, Etienne Marcheret

Enhanced histogram normalization in the acoustic feature space
Sirko Molau, Florian Hilger, Daniel Keysers, Hermann Ney

Blind normalization of speech from different channels and speakers
David N. Levin

Unsupervised acoustic model adaptation based on phoneme error minimization
Jun Ogata, Yasuo Ariki

Improved structural maximum likelihood eigenspace mapping for rapid speaker adaptation
Bowen Zhou, John H. L. Hansen

Statistical adaptation of acoustic models to noise conditions for robust speech recognition
Ángel de la Torre, Dominique Fohr, Jean-Paul Haton

Issues in automatic transcription of historical audio data
F. Brugnara, M. Cettolo, M. Federico, D. Giuliani

Speech Synthesis

Part-of-speech tagging in French text-to-speech synthesis: experiments in tagset selection
Hongyan Jing, Evelyne Tzoukermann

Grapheme-to-phoneme conversion using pseudo-morphological units
Ulla Uebler

Investigations on joint-multigram models for grapheme-to-phoneme conversion
M. Bisani, Hermann Ney

Pronunciation of proper names with a joint n-gram model for bi-directional grapheme-to-phoneme conversion
Lucian Galescu, James F. Allen

The AT&t German text-to-speech system: realistic linguistic description
Matthias Jilka, Ann K. Syrdal

Generating script using statistical information of the context variation unit vector
Haiping Li, Fangxin Chen, Liqin Shen

Efficient and scalable methods for text script generation in corpus-based TTS design
Chih-Chung Kuo, Jing-Yi Huang

A statistically motivated database pruning technique for unit selection synthesis
Peter Rutten, Matthew P. Aylett, Justin Fackrell, Paul Taylor

A new method of building decision tree based on target information
Yi-Jian Wu, Yu Hu, Xiaoru Wu, Ren-Hua Wang

A context clustering technique for average voice model in HMM-based speech synthesis
Junichi Yamagishi, Masatsune Tamura, Takashi Masuko, Keiichi Tokuda, Takao Kobayashi

Feature extraction for unit selection in concatenative speech synthesis: comparison between AIM, LPC, and MFCC
Minoru Tsuzaki, Hisashi Kawai

Combined prosody and candidate unit selections for corpus-based text-to-speech systems
Francisco Campillo-Díaz, Eduardo R. Banga

Automatic segmentation combining an HMM-based approach and spectral boundary correction
Yeon-Jun Kim, Alistair Conkie

Refined speech segmentation for concatenative speech synthesis
Abhinav Sethy, Shrikanth S. Narayanan

Refocussing on the text normalisation process in text-to-speech systems
Andrew Breen, Barry Eggleton, Peter Dion, Steve Minnis

A text-to-speech synthesis system for telugu
Jithendra Vepa, Jahnavi Ayachitam, K. V. K. Kalpana Reddy

Towards an intonation module for a portuguese TTS system
Diamantino Freitas, Daniela Braga

Applying a hybrid intonation model to a seamless speech synthesizer
Takashi Saito, Masaharu Sakamoto

Using start/end timings of spectral transitions between phonemes in concatenative speech synthesis
Toshio Hirai, Seiichi Tenpaku, Kiyohiro Shikano

Design of a Mandarin sentence set for corpus-based speech synthesis by use of a multi-tier algorithm taking account of the varied prosodic and spectral characteristics
Jinfu Ni, Hisashi Kawai

A data-driven approach to source-formant type text-to-speech system
Hiroki Mori, Takahiro Ohtsuka, Hideki Kasuya

Power spectral density based channel equalization of large speech database for concatenative TTS system
Yu Shi, Eric Chang, Hu Peng, Min Chu

CU VOCAL: corpus-based syllable concatenation for Chinese speech synthesis across domains and dialects
Helen M. Meng, Chi Kin Keung, Kai Chung Siu, Tien Ying Fung, P. C. Ching

Perceptual evaluation of naturalness due to substitution of Chinese syllable for concatenative speech synthesis
Jinlin Lu, Hisashi Kawai

Reducing the footprint of the IBM trainable speech synthesis system
Dan Chazan, Ron Hoory, Zvi Kons, Dorel Silberstein, Alexander Sorin

Computationally efficient time-scale modification of speech using 3 level clipping
Sung-Joo Lee, Hyung Soon Kim

A miniature Chinese TTS system based on tailored corpus
Zhi-Wei Shuang, Yu Hu, Zhen-Hua Ling, Ren-Hua Wang

Phonetic normalization using z-score in segmental prosody estimation for corpus-based TTS system
Hoeun Song, Jaein Kim, Kyongrok Lee, Jinyoung Kim

On F0 trajectory optimization for very high-quality speech manipulation
Hideki Kawahara, Parham Zolfaghari, Alain de Cheveigné

Modeling tones in continuous Cantonese speech
Tan Lee, Greg Kochanski, Chilin Shih, Yujia Li

Pitch contour model for Chinese text-to-speech using CART and statistical model
Minghui Dong, Kim-Teng Lua

Basque intonation modelling for text to speech conversion
Eva Navas, Inmaculada Hernáez, Juan María Sánchez

Application of microprosody models in text to speech synthesis
Phuay Hui Low, Saeed Vaseghi

Prosodic phrasing with inductive learning
Sheng Zhao, Jianhua Tao, Lianhong Cai

Speech reconstruction from mel-frequency cepstral coefficients using a source-filter model
Ben Milner, Xu Shao

Designing Japanese speech database covering wide range in prosody for hybrid speech synthesizer
Hiromichi Kawanami, Tsuyoshi Masuda, Tomoki Toda, Kiyohiro Shikano

Multimodal Spoken Language Processing

Flexible multimodal human-machine interaction in mobile environments
Dirk Bühler, Wolfgang Minker, Jochen Häußler, Sven Krüger

Implementation testing of a hybrid symbolic/statistical multimodal architecture
Edward C. Kaiser, Philip R. Cohen

Belief network based disambiguation of object reference in spoken dialogue system for robot
Yoko Yamakata, Tatsuya Kawahara, Hiroshi G. Okuno

Specification and realisation of multimodal output in dialogue systems
Jonas Beskow, Jens Edlund, Magnus Nordstrand

Gestural trajectory symmetries and discourse segmentation
Francis Quek, Yingen Xiong, David McNeill

Gestural spatialization in natural discourse segmentation
Francis Quek, David McNeill, Robert Bryll, Mary Harper

Real-time sound source localization and separation for robot audition
Kazuhiro Nakadai, Hiroshi G. Okuno, Hiroaki Kitano

CU animate tools for enabling conversations with animated characters
Jiyong Ma, Jie Yan, Ronald Cole

Multiparty multimodal interaction: a preliminary analysis
Philip R. Cohen, Rachel Coulston, Kelly Krout

Distributed audio-visual speech synchronization
Peter Poller, Jochen Müller

Lip-reading based on a fully automatic statistical model
Philippe Daubias, Paul Deléglise

Audio-visual continuous speech recognition using a coupled hidden Markov model
Xiaoxing Liu, Yibao Zhao, Xiaobo Pi, Luhong Liang, Ara V. Nefian

Data, annotation schemes and coding tools for natural interactivity
Laila Dybkjær, Niels Ole Bernsen

VisSTA: a tool for analyzing multimodal discourse data
Francis Quek, Yang Shi, Cemil Kirbas, Shunguang Wu

Spoken Language Resources

The ISL meeting corpus: the impact of meeting type on speech style
Susanne Burger, Victoria MacLaren, Hua Yu

A new method for testing dialogue systems based on simulations of real-world conditions
R. López-Cózar, Ángel de la Torre, José C. Segura, Antonio J. Rubio, J. M. López-Soler

Comfort noise detection and GSM-FR-codec detection for speech-quality evaluations in telephone networks
Thorsten Ludwig

Validation and improvement of automatic phonetic transcriptions
Catia Cucchiarini, Diana Binnenpoorte

Development of Japanese infant speech database and speaking rate analysis
Shigeaki Aman, Kazumi Kato, Tadahisa Kondo

Automatic prosodic break labeling for Mandarin Chinese speech data
Minghui Dong, Kim-Teng Lua

Orientel: speech-based interactive communication applications for the mediterranean and the middle east
Imed Zitouni, Joseph Olive, Dorota Iskra, Khalid Choukri, Ossama Emam, Oren Gedge, Emmanuel Maragoudakis, Herbert Tropf, Asunción Moreno, Albino Nogueiras Rodriguez, Barbara Heuft, Rainer Siemund

The reliability of the ITU-t p.85 standard for the evaluation of text-to-speech systems
Yolanda Vazquez Alvarez, Mark Huckvale

Automatic generation of phonetic transcriptions for large speech corpora
Kris Demuynck, Tom Laureys, Steven Gillis

Overview on recent activities in speech understanding and dialogue systems evaluation
Wolfgang Minker

The carnegie mellon communicator corpus
Christina Bennett, Alexander I. Rudnicky

Globalphone: a multilingual speech and text database developed at karlsruhe university
Tanja Schultz

On developing new text and audio corpora and speech recognition tools for the turkish language
Özgül Salor, Bryan Pellom, Tolga Çiloglu, Kadri Hacioglu, Mübeccel Demirekler

FORM: an extensible, kinematically-based gesture annotation scheme
Craig Martell

Automatic phoneme alignment based on acoustic-phonetic modeling
John-Paul Hosom

Extracting clauses for spoken language understanding in conversational systems
Narendra K. Gupta, Srinivas Bangalore, Mazin Rahim

Issues in the development of a stochastic speech understanding system
F. Lefèvre, H. Bonneau-Maynard

10 years of phondat-II: a reassessment
Hartmut R. Pfitzinger


Speech watermarking through parametric modeling
A. Gurijala, J. R. Deller Jr., M. S. Seadle, John H. L. Hansen

An education software in teaching automatic speech recognition (ASR)
Kai Sze Hong, Sh-Hussain Salleh

Multimodal integration patterns in children
Benfang Xiao, Cynthia Girand, Sharon Oviatt

ASR in a human word recognition model: generating phonemic input for shortlist
Odette Scharenborg, Lou Boves, Johan de Veth

Sign language translation using an error tolerant retrieval algorithm
Chung-Hsien Wu, Yu-Hsien Chiu, Kung-Wei Cheng

A sound source classification system based on subband processing
Oytun Turk, Omer Sayli, Helin Dutagaci, Levent M. Arslan

Automatic sign translation
Ying Zhang, Bing Zhao, Jie Yang, Alex Waibel

A study on the classification of whispered and normally phonated speech
Stanley J. Wenndt, Edward J. Cupples, Richard M. Floyd

Experiments on recognition of lavalier microphone speech and whispered speech in real world environments
Kiyoshi Tatara, Taisuke Ito, Parham Zolfaghari, Kazuya Takeda, Fumitada Itakura

An effect of amplitude modulation on perceptual segregation of tone sequences
Mamoru Iwaki, Hiromi Seki

Automatic recognition of dutch dysarthric speech: a pilot study
Eric Sanders, Marina Ruiter, Lilian Beijer, Helmer Strik

Evaluation of a system for concatenative articulatory visual speech synthesis
Olov Engwall

Intrasyllabic articulatory control constraints in verbal working memory
Marc Sato, Jean-Luc Schwartz, Marie-Agnès Cathiard, Christian Abry, Hélène Loevenbruck

Towards a grammar of spoken language: incorporating paralinguistic information
Nick Campbell

An analysis of the causes of increased error rates in children²s speech recognition
Qun Li, Martin J. Russell

A new computer-based analytical speech perception test for prelingually deaf children and children with speech disorders
Anne-Marie Öster

Vocalization age as a clinical tool
Harriet J. Fell, Joel MacAuslan, Linda J. Ferrier, Susan G. Worst, Karen Chenausky

Baldini: baldi speaks italian!
Piero Cosi, Michael M. Cohen, Dominic W. Massaro

Eyebrow movements and voice variations in dialogue situations: an experimental investigation
Christian Cavé, Isabelle Guaïtella, Serge Santi

Large Vocabulary Speech Recognition

State clustering improvements for continuous HMMs in a Spanish large vocabulary recognition system
R. Córdoba, J. Macías-Guarasa, J. Ferreiros, J. M. Montero, José M. Pardo

A comparison of HTK, ISIP and julius in slovenian large vocabulary continuous speech recognition
Tomaz Rotovnik, Mirjam Sepesy Maucec, Bogomir Horvat, Zdravko Kacic

Parametric trajectory segment model for LVCSR
Lei Jia, Bo Xu

Efficient precalculation of LM contexts for large vocabulary continuous speech recognition
F. Javier Diéguez-Tirado, Antonio Cardenal-López

Integrating multiple pronunciations during MCE-based acoustic model training for large vocabulary speech recognition
Rathi Chengalvarayan

A hybrid approach to compounds in LVCSR
Tom Laureys, Vincent Vandeghinste, Jacques Duchateau

A confidence measure based on agreement among multiple LVCSR models - correlation between pair of acoustic models and confidence
Takehito Utsuro, Tetsuji Harada, Hiromitsu Nishizaki, Seiichi Nakagawa

Combining lexical and morphological knowledge in language model for inflectional (czech) language
Jan Nouza, Jindra Drabkova

Modeling frequent allophones in Japanese speech recognition
Long Nguyen, Xuefeng Guo, John Makhoul

The structure and its implementation of hidden dynamic HMM for Mandarin speech recognition
Feili Chen, Jie Zhu, Wentao Song

A new lexicon optimization method for LVCSR based on linguistic and acoustic characteristics of words
Takahiro Shinozaki, Sadaoki Furui

Retrieving phrases by selecting the history: application to automatic speech recognition
David Langlois, Kamel Smaïli, Jean-Paul Haton

Compact subnetwork-based large vocabulary continuous speech recognition
Dong-Hoon Ahn, Minhwa Chung

A comparison of four language models for large vocabulary turkish speech recognition
Helin Dutagaci, Levent M. Arslan

Integration of Speech Technology in Language Learning

Speech recognition for language teaching and evaluating: a study of existing commercial products
Rebecca Hincks

Automatic intelligibility assessment and diagnosis of critical pronunciation errors for computer-assisted pronunciation learning
Antoine Raux, Tatsuya Kawahara

Effects of production training with visual feedback on the acquisition of Japanese pitch and durational contrasts
Yukari Hirata

Acoustic modeling of sentence stress using differential features between syllables for English rhythm learning system development
Nobuaki Minematsu, Satoshi Kobashikawa, Keikichi Hirose, Donna Erickson

Modeling and automatic detection of English sentence stress for computer-assisted English prosody learning system
Kazunori Imoto, Yasushi Tsubota, Antoine Raux, Tatsuya Kawahara, Masatake Dantsuji

Recognition and verification of English by Japanese students for computer-assisted language learning system
Yasushi Tsubota, Tatsuya Kawahara, Masatake Dantsuji

Feedback in computer assisted pronunciation training: technology push or demand pull?
Ambra Neri, Catia Cucchiarini, Helmer Strik

Corpus-based analysis of English spoken by Japanese students in view of the entire phonemic system of English
Nobuaki Minematsu, Gakuto Kurata, Keikichi Hirose

Computer-assisted second-language speech learning: generalization of prosody-focused training
Debra M. Hardison

Predicting oral reading miscues
Jack Mostow, Joseph Beck, S. Vanessa Winter, Shaojun Wang, Brian Tobin

Implementation of an intonational quality assessment system
Chanwoo Kim, Wonyong Sung

English call system with functions of speech segmentation and pronunciation evaluation using speech recognition technology
Yasuo Ariki, Jun Ogata

Speech Enhancement I

A real-time acoustic human-machine front-end for multimedia applications integrating robust adaptive beamforming and stereophonic acoustic echo cancellation
W. Herbordt, J. Ying, H. Buchner, W. Kellermann

Enhancement of single channel speech using perception-based wavelet transform
Ching-Ta Lu, Hsiao-Chuan Wang

Speech enhancement based on a perceptual modification of wiener filtering
L. Lin, W. H. Holmes, E. Ambikairajah

A new approach to speech enhancement by a microphone array using EM and mixture models
Hagai Attias, Li Deng

Acoustic echo cancellation based on m-channel IIR cosine-modulated filter bank
Sang G. Kim, Chang D. Yoo

Speech enhancement in car environment using blind source separation
Hiroshi Saruwatari, Katsuyuki Sawai, Akinobu Lee, Kiyohiro Shikano, Atsunobu Kaminuma, Masao Sakata

Speech enhancement based on combining perceptual enhancement and short-time spectral attenuation
I. Potamitis, Nikos Fakotakis, George Kokkinakis

Suitable design of adaptive beamformer based on average speech spectrum for noisy speech recognition
Takanobu Nishiura, Satoshi Nakamura, Yuka Okada, Takeshi Yamada, Kiyohiro Shikano

Highly oversampled subband adaptive filters for noise cancellation on a low-resource DSP system
King Tam, Hamid Sheikhzadeh, Todd Schneider

A perceptually motivated subspace approach for speech enhancement
Yi Hu, Philipos C. Loizou

Speech enhancement based on generalized singular value decomposition approach
Gwo-hwa Ju, Lin-shan Lee

Subspace speech enhancement using subband whitening filter
Jong Uk Kim, Chang D. Yoo

Speech enhancement using wavelet packet transform
Sungwook Chang, Sungil Jung, Y. Kwon, Sung-il Yang

Sequential MAP noise estimation and a phase-sensitive model of the acoustic environment
Li Deng, Jasha Droppo, Alex Acero

Auditory fovea based speech enhancement and its application to human-robot dialog system
Kazuhiro Nakadai, Hiroshi G. Okuno, Hiroaki Kitano

A spatio-temporal speech enhancement scheme for robust speech recognition
Erik Visser, Manabu Otsuka, Te-Won Lee

Comparative evaluation of CASA and BSS models for subband cocktail-party speech separation
Frédéric Berthommier, Seungjin Choi

Speech enhancement in non-stationary noise environments
Hyoung-Gook Kim, Dietmar Ruwisch

The 2ch hybrid subtractive beamformer applied to line sound sources
Mitsunori Mizumachi, Satoshi Nakamura

Mechanisms for Dialogue Processing

An efficient dialogue control method using decision tree-based estimation of out-of-vocabulary word attributes
Yasuhiro Takahashi, Kohji Dohsaka, Kiyoaki Aikawa

Semantic inference: a data-driven solution for NL interaction
Jerome R. Bellegarda

Unified task knowledge for spoken language understanding and dialog management
Jerry Wright, Alicia Abella, Allen Gorin

Distributed Chinese keyword spotting and verification for spoken dialogues under wireless environment
Yun-Tien Lee, Cheng-Huang Wu, Yumin Lee, Lin-shan Lee

A method for evaluating incremental utterance understanding in spoken dialogue systems
Ryuichiro Higashinaka, Noboru Miyazaki, Mikio Nakano, Kiyoaki Aikawa

Detection and recognition of repaired speech on misrecognized utterances for speech input of car navigation system
Naoko Kakutani, Norihide Kitaoka, Seiichi Nakagawa

Ingressive speech as an indication that humans are talking to humans (and not to machines)
Robert Eklund

Compensating for hyperarticulation by modeling articulatory properties
Hagen Soltau, Florian Metze, Alex Waibel

Forms of introduction in map task dialogues: case of L2 Russian speakers
Olga V. Goubanova

Bridges: regions between discourse segments
Nanette M. Veilleux

Robust semantic confidence scoring
Didier Guillevic, Simona Gandrabur, Yves Normandin

Statistically based approach to rejection of incorrectly recognized words
Ludek Müller, Tomás Bartos

Learning decision trees to determine turn-taking by spoken dialogue systems
Ryo Sato, Ryuichiro Higashinaka, Masafumi Tamoto, Mikio Nakano, Kiyoaki Aikawa

Integration of phonetic length properties in the acoustic models of false starts and out-of-vocabulary words
H. Hamimed, G. Damnati

N-word-sequence frequency noise mitigation for SLM based on binomial distribution
Yibao Zhao, Guojun Zhou

Combining acoustic and language information for emotion recognition
Chul Min Lee, Shrikanth S. Narayanan, Roberto Pieraccini

A figure of merit for the analysis of spoken dialog systems
Kadri Hacioglu, Wayne Ward

Language Modeling

Selective back-off smoothing for incorporating grammatical constraints into the n-gram language model
Tomoyosi Akiba, Katunobu Itou, Atsushi Fujii, Tetsuya Ishikawa

Backoff hierarchical class n-gram language modelling for automatic speech recognition systems
Imed Zitouni, Olivier Siohan, Hong-Kwang Jeff Kuo, Chin-Hui Lee

Constructing small language models from grammars
Francis Picard, Dominique Boucher, Guy Lapalme

Improve latent semantic analysis based language model by integrating multiple level knowledge
Rong Zhang, Alexander I. Rudnicky

Individual word language models and the frequency approach
Elvira I. Sicilia-Garcia, Ji Ming, F. Jack Smith

SRILM - an extensible language modeling toolkit
Andreas Stolcke

Efficient construction of long-range language models using log-linear interpolation
E. W. D. Whittaker, D. Klakow

Integration of two stochastic context-free grammars
Anna Corazza

Grammar specialisation meets language modelling
Manny Rayner, Beth Ann Hockey, John Dowding

Maximum entropy model for punctuation annotation from speech
Jing Huang, Geoffrey Zweig

An automatic sentence boundary detector based on a structured language model
Shinsuke Mori

Improved katz smoothing for language modeling in speech recogniton
Genqing Wu, Fang Zheng, Wenhu Wu, Mingxing Xu, Ling Jin

On the use of structures in language models for dialogue
Renato De Mori, Yannick Estève, Christian Raymond

Semantic structured language models
Hakan Erdogan, Ruhi Sarikaya, Yuqing Gao, Michael Picheny

Prosody and Speech Recognition - I

Statistical language modeling with prosodic boundaries and its use for continuous speech recognition
Keikichi Hirose, Nobuaki Minematsu, Makoto Terao

Noise robust speech recognition using F0 contour extracted by hough transform
Koji Iwano, Takahiro Seki, Sadaoki Furui

Sharing relative stress of cross-word syllables and lexical stress to spontaneous speech recognition
Farshad Almasganj, Farhad D. Dehnavi, Mahmood Bijankhan

Automatic punctuation and disfluency detection in multi-party meetings using prosodic and lexical cues
Don Baron, Elizabeth Shriberg, Andreas Stolcke

Pitch accent prediction using ensemble machine learning
Xuejing Sun

Quantitative evaluation of relevant prosodic factors for text-to-speech synthesis in Spanish
D. Escudero-Mancebo, C. González-Ferreras, V. Cardeñoso-Payo

Tone recognition in Thai continuous speech based on coarticulaion, intonation and stress effects
Nuttakorn Thubthong, Boonserm Kijsirikul, Sudaporn Luksaneeyanawin

Combination of pause and F0 information in dependency analysis of Japanese sentences
Kazuyuki Takagi, Hajime Kubota, Kazuhiko Ozeki

Estimating syntactic structure from F0 contour and pause duration in Japanese speech
Yasuo Horiuchi, Tomoko Ohsuga, Akira Ichikawa

Extraction of important sentences using F0 information for speech summarization
Yoichi Yamashita, Akira Inoue

Influence of prosody, context, and word order in the identification of focus in Japanese dialogue
Tatsuya Kitamura, Kayo Itoh, Toshihiko Itoh, Shigeyoshi Kitazawa

Influence of different dialogue situations on user²s behavior in spoken corrections
Atsuhiko Kai, Yukari Nonomura, Toshihiko Itoh, Tatsuhiro Konishi, Yukihiro Itoh

Interpreting meaning from context: modeling the prosody of discourse markers in speech
Li-chiung Yang

Prosodic parameter for speaker identification
Katarina Bartkova, David Le Gac, Delphine Charlet, Denis Jouvet

Juncture segmentation of Japanese prosodic unit based on the spectrographic features
Kitazawa Shigeyoshi, Itoh Toshihiko, Kitamura Tatsuya

Acoustic Modeling

Maximum mutual information training of hidden Markov models with vector linear predictors
K. K. Chin, P. C. Woodland

A sparse modeling approach to speech recognition based on relevance vector machines
J. E. Hamaker, J. Picone, A. Ganapathiraju

Mutual information phone clustering for decision tree induction
Ciprian Chelba, Rachel Morton

Rethinking derived acoustic features in speech recognition
Kevin S. Van Horn

Modeling HMM state distributions with Bayesian networks
Konstantin Markov, Satoshi Nakamura

Discriminative linear transforms for feature normalization and speaker adaptation in HMM estimation
Stavros Tsakalidis, Vlasios Doumpiotis, William Byrne

Speaking rate compensation based on likelihood criterion in acoustic model training and decoding
Kozo Okuda, Tatsuya Kawahara, Satoshi Nakamura

Combining maximum likelihood and maximum a posteriori estimation for detailed acoustic modeling of context dependency
Michiel Bacchiani

Large vocabulary conversational speech recognition with the extended maximum likelihood linear transformation (EMLLT) model
Jing Huang, Vaibhava Goel, Ramesh Gopinath, Brian Kingsbury, Peder Olsen, Karthik Visweswariah

Modeling varying pauses to develop robust acoustic models for recognizing noisy conversational speech
Jin-Song Zhang, Satoshi Nakamura

Improving phone-level discrimination in LDA with subphone-level classes
Hwa Jeon Song, Hyung Soon Kim

A combined model of statics-dynamics of speech optimized using maximum mutual information
Zhijian Ou, Zuoying Wang

Syllable recognition using syllable-segment statistics and syllable-based HMM
Nobutoshi Takahashi, Seiichi Nakagawa

Recurrent neural network-enhanced HMM speech recognition systems
J. W. F. Thirion, Elizabeth C. Botha

Sharing trend information of trajectory in segmental-feature HMM
Young-Sun Yun

Framewise phone classification using support vector machines
Jesper Salomon, Simon King, Jesper Salomon

A state-tying approach to building syllable HMMs
Darryl Stewart, Ming Ji, Philip Hanna, F. Jack Smith

Recognition of continuous speech segments of monophone units using support vector machines
Weifeng Lee, C. Chandra Sekhar, Kazuya Takeda, Fumitada Itakura

Construction of decision tree from data driven clustering
Junho Park, Hanseok Ko

Selective multi-path acoustic model based on database likelihoods
Akinobu Lee, Yuuichiro Mera, Hiroshi Saruwatari, Kiyohiro Shikano

Auxiliary variables in conditional Gaussian mixtures for automatic speech recognition
Todd A. Stephenson, Mathew Magimai-Doss, Hervé Bourlard

Constructing shared-state hidden Markov models based on a Bayesian approach
Shinji Watanabe, Yasuhiro Minami, Atsushi Nakamura, Naonori Ueda

Generalization of state-observation-dependency in partly hidden Markov models
Tetsuji Ogawa, Tetsunori Kobayashi

Speaker Modeling and Scoring

Structural Gaussian mixture models for efficient text-independent speaker verification
Bing Xiang, Toby Berger

Text-dependent speaker verification using lyapunov exponents
A. Petry, Dante A. C. Barone

User-customized password speaker verification based on HMM/ANN and GMM models
Mohamed F. BenZeghiba, Hervé Bourlard

Exploiting support vector machines in hidden Markov models for speaker verification
Dong Xin, Zhaohui Wu, Yingchun Yang

Speaker identification by location in an optimal space of anchor models
Yassine Mami, Delphine Charlet

ASR dependent techniques for speaker identification
Alex Park, Timothy J. Hazen

Factor analyzed Gaussian mixture models for speaker identification
Peng Ding, Yang Liu, Bo Xu

Phonetic speaker identification
Qin Jin, Tanja Schultz, Alex Waibel

DETAC: a discriminative criterion for speaker verification
Jirí Navrátil, Ganesh N. Ramaswamy

Hierarchical Gaussian mixture model for speaker verification
Ming Liu, Eric Chang, Bei-qian Dai

A reverse turing test using speech
Greg Kochanski, Daniel Lopresti, Chilin Shih

On effective speaker verification based on subword model
Sungjoo Ahn, Sunmee Kang, Hanseok Ko

Speaker verification using Gaussian component strings in dynamic trajectory space
Bing Xiang

Combining speaker and speech recognition systems
Larry P. Heck, Dominique Genoud

Automatic enrollment for speaker authentication
Qi Li, Hui Jiang, Qiru Zhou, Jinsong Zheng

Experiments in confidence scoring for word and sentence verification
M. Andorno, P. Laface, Roberto Gemello

Confidence metrics for speaker identification
Mark C. Huggins, John J. Grieco

Characteristics of a low reject mode speaker verification system
Daniel Elenius, Mats Blomberg

Issues in Audio-Visual Spoken Language Processing

Special session: issues in audiovisual spoken language processing (when, where, and how?)
Lynne E. Bernstein, Denis K. Burnham, Jean-Luc Schwartz

Audio-visual speech enhancement with AVCDCN (audio-visual codebook dependent cepstral normalization)
Sabine Deligne, Gerasimos Potamianos, Chalapathy Neti

Audiovisual speech synthesis. from ground truth to models
Gérard Bailly

The stimulus as basis for audiovisual integration
Eric Vatikiotis-Bateson, Harold Hill, Miyuki Kamachi, Karen Lander, Kevin G. Munhall

The perceptual basis for audiovisual speech integration
Lawrence D. Rosenblum

Sources of variability in the perceptual training of /r/ and /l/: interaction of adjacent vowel, word position, talkers² visual and acoustic cues
Debra M. Hardison

Audiovisual perception in L2 learners
Valerie Hazan, Anke Sennema, Andrew Faulkner

Audiovisual integration of speech by children and adults with cochlear implants
Karen Iler Kirk, David B. Pisoni, Lorin Lachs

Auditory-visual speech perception examined by brain imaging and reaction time
Kaoru Sekiyama, Yoichi Sugita

Neurocognitive basis for audiovisual speech perception: evidence from event-related potentials
Curtis W. Ponton, Edward T. Auer, Lynne E. Bernstein

Perception and integration of audiovisual speech in human infants
David J. Lewkowicz

Seeing tongue movements from outside
Gérard Bailly, Pierre Badin

An audio-visual corpus for multimodal speech recognition in dutch language
Jacek C. Wojdel, Pascal Wiggers, Leon J.M. Rothkrantz

Medium vocabulary continuous audio-visual speech recognition
Pascal Wiggers, Jacek C. Wojdel, Leon J.M. Rothkrantz

DCT-based video features for audio-visual speech recognition
Martin Heckmann, Kristian Kroschel, Christophe Savariaux, Frédéric Berthommier

The effect of auditory-visual information and orthographic background in L2 acquisition
V. Dogu Erdener, Denis K. Burnham

Perceptual evaluation of audiovisual cues for prominence
Emiel Krahmer, Zsófia Ruttkay, Marc Swerts, Wieger Wesselink

Audio-visual scene analysis: evidence for a "very-early" integration process in audio-visual speech perception
Jean-Luc Schwartz, Frédéric Berthommier, Christophe Savariaux

Design of an audio-visual speech corpus for the czech audio-visual speech synthesis
Milos Zelezný, Petr Císar, Zdenek Krnoul, Jan Novák

Coordination of hand and orofacial movements for CV sequences in French cued speech
Virginie Attina, Denis Beautemps, Marie-Agnès Cathiard

Controling anticipatory behavior for rounding in French cued speech
Virginie Attina, Marie-Agnès Cathiard, Denis Beautemps

Audio-visual speech sources separation: a new approach exploiting the audio-visual coherence of speech stimuli
David Sodoyer, Laurent Girin, Christian Jutten, Jean-Luc Schwartz

Intonational and visual cues in the perception of interrogative mode in Swedish
David House

A link between cepstral shrinking and the weighted product rule in audio-visual speech recognition
Simon Lucey, Sridha Sridharan, Vinod Chandran

Speech Technology Applications

Can confidence scores help users post-editing speech recognizer output?
Taku Endo, Nigel Ward, Minoru Terada

Information retrieval based on speech recognition results
Masatoshi Watanabe, Masahide Sugiyama

Efficient combination of type-in and wizard-of-oz tests in speech interface development process
Saija-Maaria Lemmelä, Péter Pál Boda

Probabilistic retrieval based on document representations
Wolfgang Macherey, Jörg Viechtbauer, Hermann Ney

Radiodoc: a voice-accessible document system
Takuya Nishimoto, Masahiro Araki, Yasuhisa Niimi

Speech completion: on-demand completion assistance using filled pauses for speech input interfaces
Masataka Goto, Katunobu Itou, Satoru Hayamizu

Design of system-initiated digressive proposals for automated banking dialogues
Jenny Wilkie, Mervyn A. Jack, Peter Littlewood

Towards every-citizen²s speech interface: an application generator for speech interfaces to databases
Arthur R. Toth, Thomas K. Harris, James Sanders, Stefanie Shriver, Roni Rosenfeld

Training topic classifiers for conversational speech with limited data
Rukmini Iyer, Jeffrey Ma, Herbert Gish, Owen Kimball

Comparing isolately spoken keywords with spontaneously spoken queries for Japanese spoken document retrieval
Hiromitsu Nishizaki, Seiichi Nakagawa

Choosing speech or touchtone modality for navigation within a telephony natural language system
Jennifer C. Lai, Kwan Min Lee

Multi-scale and multi-model integration for improved performance in Chinese spoken document retrieval
Wai-Kit Lo, Helen M. Meng, P. C. Ching

Development of a GUI-based articulatory speech synthesis system
Kohichi Ogata, Yorinobu Sonoda


Comparing intelligibility of several non-native accent classes in noise
Shawn A. Weil

Effect of F0 fluctuation and amplitude modulation of natural vowels on vowel identification in noisy environments
Kentaro Ishizuka, Kiyoaki Aikawa

Similarities of words in noise in Japanese
Kiyoko Yoneyama

The effects of F0 manipulation on the perceived distance of speech
Douglas S. Brungart, Alexander J. Kordik, Koel Das, Arnab K. Shaw

Time-compressing natural and synthetic speech
Esther Janse

Accounting for perceptual identification of consonants and vowels through acoustic dissimilarity
Jianxia Xue, Sumiko Takayanagi, Lynne E. Bernstein

Modeling recognition of speech sounds with minerva2
Travis Wade, Deborah K. Eakin, Russell Webb, Arvin Agah, Frank Brown, Allard Jongman, John Gauch, Thomas A. Schreiber, Joan Sereno

Syllable processing in English
Ruth Kearns, Dennis Norris, Anne Cutler

Perceptual effects of assimilation-induced violation of final devoicing in dutch
Cecile Kuijpers, Wilma van Donselaar, Anne Cutler

Access to homophonic meanings during spoken language comprehension: effects of context and neighborhood density
Michael C.W. Yip

Intelligibility of reverse speech in French: a perceptual study
Ivan Magrin-Chagnolleau, Melissa Barkat, Fanny Meunier

Contextual effects in the perception of fricative place of articulation: a rotational hypothesis
Willy Serniclaes, René Carré

What relationship between protrusion anticipation and auditory perception?
Rudolph Sock, Béatrice Vaxelaire, Véronique Hecker, Fabrice Hirsch

On the role of the "schwa" in the perception of plosive consonants
René Carré, Jean Sylvain Liénard, Egidio Marsico, Willy Serniclaes

The perception of stop consonant sequences in dyslexic and normal children
Noël Nguyen, Ludovic Jankowski, Michel Habib

Submoraic awareness by Japanese school children: evidence from a novel game
Takashi Otake, Akemi Iijima

Speaker intelligibility of adults and children
D. Markham, Valerie Hazan

Acoustical correlates to SD ratings of speaker characteristics in two speaking styles
Yasuki Yamashita, Hiroshi Matsumoto

Subjective assessment of frequency bands for perception of speaker identity
Eda Ormanci, U. Hakan Nikbay, Oytun Turk, Levent M. Arslan

Spoken Document Retrieval

Contribution to topic identification by using word similarity
Armelle Brun, Kamel Smaïli, Jean-Paul Haton

Speechfind: an experimental on-line spoken document retrieval system for historical audio archives
Bowen Zhou, John H. L. Hansen

Topic tracking using subject templates
Yoshimi Suzuki, Fumiyo Fukumoto, Yoshihiro Sekiguchi

Topic detection of an utterance for speech dialogue processing
Katsushi Asami, Toshiyuki Takezawa, Genichiro Kikui

Real-time rich-content transcription of Chinese broadcast news
Daben Liu, Jeffrey Ma, Dongxin Xu, Amit Srivastava, Francis Kubala

Improved Chinese spoken document retrieval with hybrid modeling and data-driven indexing features
Chun-Jen Wang, Berlin Chen, Lin-shan Lee

Exploring sub-word features and linear support vector machines for German spoken document classification
Martha Larson, Stefan Eickeler, Gerhard Paaß, Edda Leopold, Jörg Kindermann

Goal-directed ASR in a multimedia indexing and searching environment (MUMIS)
Mirjam Wester, Judith M. Kessens, Helmer Strik

Confusion-based query expansion for OOV words in spoken document retrieval
Beth Logan, J. M. Van Thong

Cluster identification for speaker-environment tracking
J. T. Wickramaratna, P. C. Woodland

Robust speech / music classification in audio documents
Julien Pinquier, Jean-Luc Rouas, Régine André-Obrecht

Expanded examinations of a low frequency modulation feature for speech/music discrimination
Stefan Karnebäck

Speech, music and songs discrimination in the context of handsets variability
Hassan Ezzaidi, Jean Rouat

Speech Features

Evaluation of formant-like features for ASR
Katrin Weber, Febe de Wet, Bert Cranen, Lou Boves, Samy Bengio, Hervé Bourlard

Entropy of energy operator as feature for large vocabulary Mandarin speaker independent speech recognition
Fadhil H. T. Al-Dulaimy, Zuoying Wang

Improving parametric trajectory modeling by integration of pitch and tone information
Yiyan Zhang, Wenju Liu, Bo Xu, Huayun Zhang

Comparative experiments to evaluate the use of auditory-based acoustic distinctive features and formant cues for automatic speech recognition using a multi-stream paradigm
Hesham Tolba, Sid-Ahmed Selouani, Douglas O’Shaughnessy

Speech recognition using combined acoustic and articulatory information with retraining of acoustic model parameters
Ka-Yee Leung, Manhung Siu

Improved phone recognition on TIMIT using formant frequency data and confidence measures
N. J. Wilkinson, Martin J. Russell

Speaker independent speech recognition using features based on glottal sound source
Norihide Kitaoka, Daisuke Yamada, Seiichi Nakagawa

An evaluation of using mutual information for selection of acoustic-features representation of phonemes for speech recognition
Mohamed Kamal Omar, Ken Chen, Mark Hasegawa-Johnson, Yigal Brandman

A flexible stream architecture for ASR using articulatory features
Florian Metze, Alex Waibel

Speech recognition using fundamental frequency and voicing in acoustic modeling
Andrej Ljolje

A comparison of front-end analyses for Thai speech recognition
Montri Karnjanadecha, Patimakorn Kimsawad

New model for speech residual signal shaping with static nonlinearity
Jari Turunen, Juha T. Tanttu, Pekka Loula

Formant model estimation and transformation for voice morphing
Ching-Hsiang Ho, Dimitrios Rentzos, Saeed Vaseghi

Production and perception of pauses and their linguistic context in read and spontaneous speech in Swedish
Beáta Megyesi, Sofia Gustafson-Capková

Non-linear techniques for dysphonic voice analysis and correction
Claudia Manfredi, Lorenzo Matassini

Adaptive estimation of time-varying features from high-pitched speech based on an excitation source HMM
Akira Sasou, Kazuyo Tanaka

Lip gestures in English sibilants: articulatory - acoustic relationship
Martine Toda, Shinji Maeda, Andreas J. Carlen, Lyes Meftahi

Bark resolution from speech data
Naren Malayath, Hynek Hermansky

Special Topics in Robust Speech Recognition

Noise-robust speech recognition in car environments using genetic algorithms and a mel-cepstral subspace approach
Sid-Ahmed Selouani, Douglas O’Shaughnessy

Modeling with a subspace constraint on inverse covariance matrices
Scott Axelrod, Ramesh Gopinath, Peder Olsen

Improving speech recognition performance of small microphone arrays using missing data techniques
Iain A. McCowan, Andrew C. Morris, Hervé Bourlard

Double the trouble: handling noise and reverberation in far-field automatic speech recognition
David Gelbart, Nelson Morgan

Model-based independent component analysis for robust multi-microphone automatic speech recognition
Laurent Couvreur, Christophe Ris

Compensation of channel effect on line spectrum frequencies
An-Tze Yu, Hsiao-Chuan Wang

Codebook dependent dynamic channel estimation for Mandarin speech recognition over telephone
Huayun Zhang, Zhaobing Han, Bo Xu

Robust multiple resolution analysis for automatic speech recognition
Roberto Gemello, Franco Mana, Paolo Pegoraro, Renato De Mori

HMM-based methods for channel error mitigation in distributed speech recognition
Antonio M. Peinado, Victoria Sánchez, José L. Pérez-Córdoba, José C. Segura, Antonio J. Rubio

Network-based vs. distributed speech recognition in adaptive multi-rate wireless systems
Tim Fingscheidt, Stefanie Aalburg, Sorel Stan, Christophe Beaugeant

Channel noise robustness for low-bitrate remote speech recognition
Alexis Bernard, Abeer Alwan

Influence of transmission errors on ASR systems
C. Peláez-Moreno, A. Gallardo-Antolín, J. Vicente-Peña, F. Díaz-de-María

Robust feature extraction in a variety of input devices on the basis of ETSI standard DSR front-end
Satoru Tsuge, Shingo Kuroiwa, Masami Shishibori, Fuji Ren, Kenji Kita

Channel error protection scheme for distributed speech recognition
Zheng-Hua Tan, Paul Dalsgaard

The effects of speech compression on speech recognition and text-to-speech synthesis
Yeshwant Muthusamy, Yifan Gong, Roshan Gupta

Transform-based feature vector compression for distributed speech recognition
Ben Milner, Xu Shao

Issues in Speech Recognition

Towards the question: why has speaking rate such an impact on speech recognition performance?
Robert Faltlhauser, Günther Ruske, M. Thomae

Robust voiced-unvoiced decision associated to continuous pitch tracking in noisy telephone speech
Mijail Arcienega, Andrzej Drygajlo

Noise adaptive speech recognition with acoustic models trained from noisy speech evaluated on Aurora-2 database
Kaisheng Yao, Kuldip K. Paliwal, Satoshi Nakamura

Recognition of noisy speech using normalized moments
Jingdong Chen, Yiteng (Arden) Huang, Qi Li, Frank K. Soong

Low-resource noise-robust feature post-processing on Aurora 2.0
Chia-Ping Chen, Jeff A. Bilmes, Katrin Kirchhoff

Exploiting variances in robust feature extraction based on a parametric model of speech distortion
Li Deng, Jasha Droppo, Alex Acero

Improving performance of an HMM-based ASR system by using monophone-level normalized confidence measure
Muhammad Ghulam, Takashi Fukuda, Takaharu Sato, Tsuneo Nitta

Model partial pronunciation variations for spontaneous Mandarin speech recognition
Yi Liu, Pascale Fung

Reducing pronunciation lexicon confusion and using more data without phonetic transcription for pronunciation modeling
Fang Zheng, Zhanjiang Song, Pascale Fung, William Byrne

Classification error from the theoretical Bayes classification risk
Erik McDermott, Shigeru Katagiri

Combined binary classifiers with applications to speech recognition
Aldebaro Klautau, Nikola Jevtic, Alon Orlitsky

Optimal selection of speech data for automatic speech recognition systems
Arkadiusz Nagórski, Lou Boves, Herman Steeneken



