ISCA Archive Eurospeech 2001 Sessions Booklet
  ISCA Archive Sessions Booklet
top

7th European Conference on Speech Communication and Technology

Aalborg, Denmark
3-7 September 2001

General Chair: Paul Dalsgaard




Speech Recognition and Understanding: Pronunciation and Subword Units


Modeling pronunciation variation using context-dependent weighting and b/s refined acoustic modeling
Fang Zheng, Zhanjiang Song, Pascale Fung, William Byrne

Learning units for domain-independent out-of- vocabulary word modelling
Issam Bazzi, James Glass

Pronunciation variant analysis using speaking style parallel corpus
Hideharu Nakajima, Izumi Hirano, Yoshinori Sagisaka, Katsuhiko Shirai

Speech recognition for huge vocabularies by using optimized sub-word units
Jan Kneissler, Dietrich Klakow

Dynamic lexicon using phonetic features
Kyung-Tak Lee, Christian J. Wellekens

Triphone tying techniques combining a-priori rules and data driven methods
Ute Ziegenhain, Josef G. Bauer

Pronunciation modeling and lexical adaptation in midsize vocabulary ASR
Louis F. M. ten Bosch, Nick Cremelie

Estimating pronunciation variations from acoustic likelihood score for HMM reconstruction
Liu Yi, Pascale Fung

Breadth-first search for finding the optimal phonetic transcription from multiple utterances
M. Bisani, Hermann Ney

Improved data-driven generation of pronunciation dictionaries using an adapted word list
Matthias Wolff, Matthias Eichner, Rüdiger Hoffmann

Segment-based recognition on the phonebook task: initial results and observations on duration modeling
Karen Livescu, James Glass

Multilingual text-to-phoneme mapping
Søren Kamaric Riis, Morten With Pedersen, Kare Jean Jensen

Pronunciation variation analysis with respect to various linguistic levels and contextual conditions for Mandarin Chinese
Ming-yi Tsai, Fu-chiang Chou, Lin-shan Lee

Hypothesis-driven accent discrimination
Laura Mayfield Tomokiyo

An approach to automatic phonetic baseform generation based on Bayesian networks
Changxue Ma, Mark A. Randolph

Towards discriminative lexicon optimization
Hauke Schramm, Peter Beyerlein

Model complexity optimization for nonnative English speakers
Xiaodong He, Yunxin Zhao

Pronunciation modeling in hungarian number recognition
Tibor Fegyó, Péter Mihajlik, Péter Tatai, Géza Gordos




Speech Perception: Miscellaneous


Coarticulatory effects in perception
Santiago Fernández, Sergio Feijóo

A case for multi-resolution auditory scene analysis
Sue Harding, Georg Meyer

Perceptual identification and normalization of synthesized French vowels from birth to adulthood
Lucie Ménard, Jean-Luc Schwartz, Louis-Jean Boë, Sonia Kandel, Nathalie Vallée

Perceptual categorization of maximal vowel spaces from birth to adulthood simulated by an articulatory model
Lucie Ménard, Louis-Jean Boë

A study on speech over the telephone and aging
Maxine Eskenazi, Alan W. Black

On the perception of voicing for plosives in noise
Marcia Chen, Abeer Alwan

Predicting visual consonant perception from physical measures
Jintao Jiang, Abeer Alwan, Edward T. Auer, Lynne E. Bernstein

Effects of noise adaptation on the perception of voiced plosives in isolated syllables
William A. Ainsworth, T. Cervera

On differential limen of word-based local speechrate variation in Japanese expressed by duration ratio
Makoto Hiroshige, Kenji Araki, Koji Tochinai

A multidimensional scaling study of fricatives; a comparison of perceptual and physical dimensions
Wan Tokuma

Reconstructing dialogue history
Marc Swerts, Emiel Krahmer

Timing and interaction of visual cues for prominence in audiovisual speech perception
David House, Jonas Beskow, Björn Granström

Modelling the perceptual identification of Japanese consonants from LPC cepstral distances
Masahiko Komatsu, Shinichi Tokuma, Won Tokuma, Takayuki Arai

Auditory-visual perception of lexical tone
Denis Burnham, Valter Ciocca, Stephanie Stokes

Syllable prominence: a matter of vocal effort, phonetic distinct-ness and top-down processing
Anders Eriksson, Gunilla C. Thunberg, Hartmut Traunmüller

Perceived prominence in terms of a linguistically motivated quantitative intonation model
Hansjörg Mixdorff, Christina Widera

Perception of coda voicing from properties of the onset and nucleus of 'led' and 'let'
Sarah Hawkins, Noël Nguyen

Auditory filter bank design using masking curves
L. Lin, E. Ambikairajah, W. H. Holmes

A new feature driven cochlear implant speech processing strategy
Dashtseren Erdenebat, Kitazawa Shigeyoshi, Kitamura Tatsuya


Noise Robust Recognition: Frontend and Compensation Algorithms (Special Session)


Noise robust feature extraction for ASR using the Aurora 2 database
Qifeng Zhu, Markus Iseli, Xiaodong Cui, Abeer Alwan

Investigations into tandem acoustic modeling for the Aurora task
Daniel P.W. Ellis, Manuel J. Reyes Gomez

Recognition performance of the siemens front-end with and without frame dropping on the Aurora 2 database
Bernt Andrassy, Damjan Vlaj, Christophe Beaugeant

A multiconditional robust front-end feature extraction with a noise reduction procedure based on improved spectral subtraction algorithm
Bojan Kotnik, Zdravko Kacic, Bogomir Horvat

Feature vector selection to improve ASR robustness in noisy conditions
Johan de Veth, Laurent Mauuary, Bernhard Noe, Febe de Wet, Jürgen Sienel, Louis Boves, Denis Jouvet

Comparison of spectral derivative parameters for robust speech recognition
Dusan Macho, Climent Nadeu

Robust digit recognition in noise: an evaluation using the AURORA corpus
Umit Yapanel, John H. L. Hansen, Ruhi Sarikaya, Bryan Pellom

Robust ASR based on clean speech models: an evaluation of missing data techniques for connected digit recognition in noise
Jon Barker, Martin Cooke, Phil Green

Evaluation of the SPLICE algorithm on the Aurora2 database
Jasha Droppo, Li Deng, Alex Acero

Model-based compensation of the additive noise for continuous speech recognition. experiments using the Aurora II database and tasks
José C. Segura, Angel de la Torre, M. Carmen Benitez, Antonio M. Peinado

MAP combination of multi-stream HMM or HMM/ANN experts
Andrew Morris, Astrid Hagen, Hervé Bourlard

Second order statistics spectrum estimation method for robust speech recognition
Bojan Jarc, Rudolf Babic

Feature extraction and model-based noise compensation for noisy speech recognition evaluated on AURORA 2 task
Kaisheng Yao, Jingdong Chen, Kuldip K. Paliwal, Satoshi Nakamura





Phonetics and Phonology: Segmentals and Synthesis


Native vs non-native production of English vowels in spontaneous speech: an acoustic phonetic study
Kimiko Tsukada

Is non-native pronunciation modelling necessary ?
Silke Goronzy, Marina Sahakyan, Wolfgang Wokurek

Burst segmentation and evaluation of acoustic cues
Yves Laprie, Anne Bonneau

The schwa in albanian
Theodor Granser, Sylvia Moosmüller

A testbed for developing multilingual phonotactic descriptions
Simone Ashby, Julie Carson-Berndsen, Gina Joue

A physiological analysis of nasals and nasalization in Chinese
Wing-Nga Fung, Sze-Lok Lau

A component by component listening test analysis of the IBM trainable speech synthesis system
Robert E. Donovan

Semantic abnormality and its realization in spoken language
Shimei Pan, Kathleen McKeown, Julia Hirschberg

TALKING FOREIGN - concatenative speech synthesis and the language barrier
Nick Campbell

Schwa-assimilation in danish synthetic speech
Christian Jensen

Text-to-speech synthesis with arbitrary speaker's voice from average voice
Masatsune Tamura, Takashi Masuko, Keiichi Tokuda, Takao Kobayashi

High quality voice conversion based on Gaussian mixture model with dynamic frequency warping
Tomoki Toda, Hiroshi Saruwatari, Kiyohiro Shikano

Voice transformations: from speech synthesis to mammalian vocalizations
Min Tang, Chao Wang, Stephanie Seneff

A new multi-speaker formant synthesizer that applies voice conversion techniques
J. M. Gutiérrez-Arriola, J. M. Montero, J. A. Vallejo, R. Córdoba, R. San-Segundo, Juan M. Pardo

Evaluation of cross-language voice conversion based on GMM and straight
Mikiko Mashimo, Tomoki Toda, Kiyohiro Shikano, Nick Campbell

Ejective reduction in chaha is conditioned by more than prosodic position
Rachel Coulston






Speech Synthesis: Systems and Prosody


Festival speaks Italian!
Piero Cosi, Fabio Tesser, Roberto Gretter, Cinzia Avesani, Mike Macon

Multilingual TTS for computer telephony: the aculab approach
Alex Monaghan, Mahmoud Kassaei, Mark Luckin, Mariscela Amador-Hernandez, Andrew Lowry, Dan Faulkner, Fred Sannier

A flexible multilingual TTS development and speech research tool
Géza Kiss, Géza Németh, Gábor Olaszy, Géza Gordos

Speech synthesis development made easy: the bonn open synthesis system
Esther Klabbers, Karlheinz Stöber, Raymond Veldhuis, Petra Wagner, Stefan Breuer

Automatic prosody generation - a model for hungarian
Gábor Olaszy, Géza Németh, Péter Olaszi

Evaluation of PROS-3 for the assignment of prosodic structure, compared to assignment by human experts
Olga van Herwijnen, Jacques Terken

Stochastic F0 contour model based on the clustering of F0 shapes of a syntactic unit
Yoichi Yamashita, Tomoyoshi Ishida

Intonational phrase break prediction using decision tree and n-gram model
Xuejing Sun, Ted H. Applebaum

Synthesizing intonation of standard arabic language
A. Zaki, A. Rajouani, M. Najim

Invariance of relative F0 change field of Chinese disyllabic words
Dawei Xu, Hiroki Mori, Hideki Kasuya

Accent label prediction by time delay neural networks using gating clusters
Achim F. Müller, Rüdiger Hoffmann

Transformation-based learning of danish stress assignment
Peter Juel Henrichsen

On the prosody of German telephone numbers
Stefan Baumann, Jürgen Trouvain

Emotional speech synthesis: a review
Marc Schröder

Fun or boring? a web-based evaluation of expressive synthesis for children
Kjell Gustafson, David House


Speech Recognition and Understanding: Articulatory and Perceptual Approaches to ASR


Sub-band based additive noise removal for robust speech recognition
Jingdong Chen, Kuldip K. Paliwal, Satoshi Nakamura

Development of an asynchronous multi-band system for continuous speech recognition
Yik-Cheung Tam, Brian Mak

A multi-band approach based on the probabilistic union model and frequency-filtering features for robust speech recognition
Peter Jancovic, Ji Ming

Split-band perceptual harmonic cepstral coefficients as acoustic features for speech recognition
Liang Gu, Kenneth Rose

Error correcting posterior combination for robust multi-band speech recognition
Astrid Hagen, Herve Bourlard

Robust parameters for speech recognition based on subband spectral centroid histograms
Bojana Gajic, Kuldip K. Paliwal

Pseudo-articulatory representations and the recognition of syllable patterns in speech
William H. Edmondson, Li Zhang

ASR - articulatory speech recognition
Joe Frankel, Simon King

Efficient decoding strategy for conversational speech recognition using state-space models for vocal-tract-resonance dynamics
Jeff Z. Ma, Li Deng

HMM2- extraction of formant structures and their use for robust ASR
Katrin Weber, Samy Bengio, Hervé Bourlard

Auditory model based speech recognition in noisy environment
Xiaoqing Yu, Wanggen Wan, Daniel P. K. Lun

Forward masking for increased robustness in automatic speech recognition
Sascha Wendt, Gernot A. Fink, Franz Kummert

An auditory system-based feature for robust speech recognition
Qi Li, Frank K. Soong, Olivier Siohan




Speech Production: Prosody


Prominence correlates. a study of Swedish
Gunnar Fant, Anita Kruckenberg, Johan Liljencrants, Antonis Botinis

Quantitative analysis of the effects of emphasis upon prosodic features of speech
Sumio Ohno, Hiroya Fujisaki

Towards a model of target oriented production of prosody
Grzegorz Dogil, Bernd Möbius

Prosody control for speaking and singing styles
Chilin Shih, Greg Kochanski

Automated modeling of Chinese intonation in continuous speech
Greg Kochanski, Chilin Shih

Prediction of intonation patterns of accented words in a corpus of read Swedish news through pitch contour stylization
Johan Frid

The use of fundamental frequency raising as a strategy for increasing vocal intensity in soft, normal, and loud phonation
Paavo Alku, Juha Vintturi, Erkki Vilkma

Prosodic interactions on segmental durations ingreek
Antonis Botinis, Marios Fourakis, Robert Bannert

Study on factors influencing durations of syllables in Mandarin
Min Chu, Yongqiang Feng

A comparative study of pauses in dialogues and read speech
Sofia Gustafson-Capkova, Beata Megyesi

Detecting Japanese local speech rate deceleration in spontaneous conversational speech using a variable threshold
Keiichi Takamaru, Makoto Hiroshige, Kenji Araki, Koji Tochinai

Modelling fundamental frequency in first post-tonic syllables in danish sentences
Niels Reinholt Petersen

Non-finality and pre-finality in bari Italian intonation: a preliminary account
Michelina Savino

Building an integrated prosodic model of German
Hansjörg Mixdorff, Oliver Jokisch

A model of F0 contour for arabic affirmative and interrogative sentences
Omar A. G. Ibrahim, S.H. El-Ramly, N.S. Abdel-Kader

Variation in final lengthening as a function of topic structure
Caroline L. Smith, Lisa A. Hogan

Do speakers realize the prosodic structure they say they do?
Olga van Herwijnen, Jacques Terken

Coarticulatory effects at prosodic boundaries: some acoustic results
Marija Tabain, Guillaume Rolland, Christophe Savariaux

Generating duration from a cognitively plausible model of rhythm production
Plínio A. Barbosa


Speech Recognition and Understanding: Acoustic Modelling - I


A mixture of Gaussians front end for speech recognition
M. N. Stuttle, M. J. F. Gales

Improved maximum mutual information estimation training of continuous density HMMs
Jing Zheng, John Butzberger, Horacio Franco, Andreas Stolcke

Maximum-likelihood training of a bipartite acoustic model for speech recognition
Florent Perronnin, Roland Kuhn, Patrick Nguyen, Jean-Claude Junqua

Analysis of the root-cepstrum for acoustic modeling and fast decoding in speech recognition
Ruhi Sarikaya, John H. L. Hansen

Distinctive features for use in an automatic speech recognition system
Ellen Eide

Improved context-dependent acoustic modeling for continuous Chinese speech recognition
Jiyong Zhang, Fang Zheng, Jing Li, Chunhua Luo, Guoliang Zhang

Class definition in discriminant feature analysis
Jacques Duchateau, Kris Demuynck, Dirk Van Compernolle, Patrick Wambacq

Feature extraction from time-frequency matrices for robust speech recognition
Jose C. Segura, M. Carmen Benitez, Angel de la Torre, Antonio J. Rubio

Using spatial correlation information in speech recognition
Yu Peng, Wang Zuoying

On the choice of classes in MCE based discriminative HMM-training for speech recognizers used in the telephone environment
Josef G. Bauer

Plosive spotting with margin classifiers
Joseph Keshet, Dan Chazan, Ben-Zion Bobrovsky

Model agglomeration for context-dependent acoustic modeling
Fabio Brugnara

Multipass algorithm for acquisition of salient acoustic morphemes
M. Levit, A. L. Gorin, J. H. Wright

Rapid vocal tract length normalization using maximum likelihood estimation
Tadashi Emori, Koichi Shinoda

Towards the creation of acoustic models for stressed Japanese speech
Kozo Okuda, Tomoko Matsui, Satoshi Nakamura

Elderly acoustic model for large vocabulary continuous speech recognition
Akira Baba, Shinichi Yoshizawa, Miichi Yamada, Akinobu Lee, Kiyohiro Shikano

A hybrid approach to enhance task portability of acoustic models in Chinese speech recognition
Jin-Song Zhang, Shu-Wu Zhang, Yoshinori Sagisaka, Satoshi Nakamura

Evaluation of sublexical and lexical models of acoustic disfluencies for spontaneous speech recognition in Spanish
L. J. Rodriguez, I. Torres, A. Varona

Structural learning of dynamic Bayesian networks in speech recognition
Murat Deviren, Khalid Daoudi


Linguistic Modelling: Language Models


Structured language model for class identification of out-of-vocabulary words arising from multiple wordclasses
Shigehiko Onishi, Hirofumi Yamamoto, Yoshinori Sagisaka

New language models using phrase structures extracted from parse trees
Takatoshi Jitsuhiro, Hirofumi Yamamoto, Setsuo Yamada, Yoshinori Sagisaka

Triggering individual word domains in n-gram language models
E. I. Sicilia-Garcia, Ji Ming, F. J. Smith

A structured statistical language model conditioned by arbitrarily abstracted grammatical categories based on GLR parsing
Tomoyosi Akiba, Katunobu Itou

Speech recognition of broadcast sports news
Atsushi Matsui, Hiroyuki Segi, Akio Kobayashi, Toru Imai, Akio Ando

Improvement of a structured language model: arbori-context tree
Shinsuke Mori, Masafumi Nishimura, Nobuyasu Itoh

Smoothing issues in the structured language model
Woosung Kim, Sanjeev Khudanpur, Jun Wu

The study of the effect of training set on statistical language modeling
Xipeng Shen, Bo Xu

Stochastic finite state automata language model triggered by dialogue states
Yannick Esteve, Frédéric Bechet, Alexis Nasr, Renato De Mori

A baseline method for compiling typed unification grammars into context free language models
Manny Rayner, John Dowding, Beth Ann Hockey

Comparison of width-wise and length-wise language model compression
E. W. D. Whittaker, Bhiksha Raj

Large vocabulary statistical language modeling for continuous speech recognition in finnish
Vesa Siivola, Mikko Kurimo, Krista Lagus

A new technique based on augmented language models to improve the performance of spoken dialogue systems
R. López-Cózar, D. H. Milone

Pause information for dependency analysis of read Japanese sentences
Kazuyuki Takagi, Kazuhiko Ozeki

An HMM/n-gram-based linguistic processing approach for Mandarin spoken document retrieval
Berlin Chen, Hsin-min Wang, Lin-shan Lee

Probabilistic concept verification for language understanding in spoken dialogue systems
Yi-Chung Lin, Huei-Ming Wang

Turkish word segmentation using morphological analyzer
M. Oguzhan Külekcý, Mehmed Özkan

Thai grapheme-to-phoneme using probabilistic GLR parser
Pongthai Tarsaku, Virach Sornlertlamvanich, Rachod Thongprasirt

Aligning prosody and syntax in property grammars
Philippe Blache, Daniel Hirst

From perceptual designs to linguistic typology and automatic language identification : overview and perspectives
Melissa Barkat, Ioana Vasilescu

Morphological approaches for an English pronunciation lexicon
Susan Fitt

An embodiment paradigm for speech recognition systems
Gina Joue, Julie Carson-Berndsen

Multi-parser architecture for query processing
Kui Xu, Fuliang Weng, Helen M. Meng, Po Chui Luk

Two-stage probabilistic approach to text segmentation
Yi-Chia Chen, Yi-Chung Lin

Lexicon optimization for dutch speech recognition in spoken document retrieval
Roeland Ordelman, Arjan van Hessen, Franciska de Jong

Evaluation of recent speech grammar standardization efforts
Tom Brøndsted


Speaker Recognition: Identification, Verification and Tracking. Speech Recognition and Understanding: Language Identification


The influence of vocal effort on human speaker identification
Douglas S. Brungart, Kimberly R. Scott, Brian D. Simpson

Improving speaker recognition using phonetically structured Gaussian mixture models
Robert Faltlhauser, Günther Ruske

Information fusion for robust speaker verification
Conrad Sanderson, Kuldip K. Paliwal

A robust speaker verification system against imposture using an HMM-based speech synthesis system
Takayuki Satoh, Takashi Masuko, Takao Kobayashi, Keiichi Tokuda

Sequential decisions for faster and more flexible verification
Arun C. Surendran

Background learning of speaker voices for textindependent speaker identification
Wei-Ho Tsai, Y. C. Chu, Chao-Shih Huang, Wen-Whei Chang

Explicit exploitation of stochastic characteristics of test utterance for text-independent speaker identification
Wei-Ho Tsai, Wen-Whei Chang, Chao-Shih Huang

Improvement of speaker verification for Thai language
Chai Wutiwiwatchai, Varin Achariyakulporn, Sawit Kasuriya

Speaker identification for car infotainment applications
Javier Rodríguez-Saeta, Christian Koechling, Javier Hernando

A system for text dependent speaker verification - field trial evaluation and simulation results
H. Schalk, Herbert Reininger, Stephan Euler

Speaker recognition in a multi-speaker environment
Alvin F. Martin, Mark A. Przybocki

A new DP-like speaker clustering algorithm
Zhijian Ou, Zuoying Wang

On the use of the Bayesian information criterion in multiple speaker detection
P. Sivakumaran, J. Fortuna, A. M. Ariyaeeinia

Preliminary experiments on language identification using broadcast news recordings
Laurent Benarousse, Edouard Geoffrois

Multi-stream statistical n-gram modeling with application to automatic language identification
Katrin Kirchhoff, Sonia Parandekar




Speech Recognition and Understanding: Noise Robustness


A comparison of LPC and FFT-based acoustic features for noise robust ASR
Febe de Wet, Bert Cranen, Johan de Veth, Loe Boves

Unsupervised noisy environment adaptation algorithm using MLLR and speaker selection
Miichi Yamada, Akira Baba, Shinichi Yoshizawa, Yuichiro Mera, Akinobu Lee, Hiroshi Saruwatari, Kiyohiro Shikano

Applying parallel model compensation with mel-frequency discrete wavelet coefficients for noise-robust speech recognition
Zekeriya Tufekci, John N. Gowdy, Sabri Gurbuz, E. Patterson

Linear interpolation of cepstral variance for noisy speech recognition
Tai-Hwei Hwang, Kuo-Hwei Yuo, Hsiao-Chuan Wang

Evaluation of a generalized dynamic cepstrum in distant speech recognition
Hiroshi Matsumoto, Akihiko Shimizu, Kazumasa Yamamoto

Robust speech/non-speech detection using LDA applied to MFCC for continuous speech recognition
Arnaud Martin, Géraldine Damnati, Laurent Mauuary

Toward noise-tolerant acoustic models
Edmondo Trentin, Marco Gori

Noise estimation without explicit speech, non-speech detection: a comparison of mean, modal and median based approaches
Nicholas W. D. Evans, John S. Mason

Evaluation of front-end features and noise compensation methods for robust Mandarin speech recognition
Rathi Chengalvarayan

ALGONQUIN: iterating laplace's method to remove multiple types of acoustic distortion for robust speech recognition
Brendan J. Frey, Li Deng, Alex Acero, Trausti Kristjansson

Robust speech recognition in noise: an evaluation using the SPINE corpus
John H. L. Hansen, Ruhi Sarikaya, Umit Yapanel, Bryan Pellom

Robust speech recognition against packet loss
Manhung Siu, Yu-Chung Chan

Rapid CODEC adaptation for cellular phone speech recognition
Masaki Naito, Shingo Kuroiwa, Tsuneo Kato, Tohru Shimizu, Norio Higuchi

A robust front-end for ASR over IP snd GSM networks: an integrated scenario
Ascension Gallardo-Antolin, Carmen Pelaez-Moreno, Fernando Diaz-de-Maria

Robust speech recognition using missing feature theory and vector quantization
Philippe Renevey, Rolf Vetter, Jens Krauss

Modeling the mixtures of known noise and unknown unexpected noise for robust speech recognition
Ji Ming, Peter Jancovic, Philip Hanna, Darryl Stewart

Robust speech recognition based on selective use of missing frequency band HMMs
Takayoshi Kawamura, Kazuya Takeda, Fumitada Itakura

A new method for speech recognition in the presence of non-stationary, unpredictable and high-level noise
Ikuyo Masuda-Katsuse

A computational efficient real time noise robust speech recognition based on improved spectral subtraction method
Bojan Kotnik, Zdravko Kacic, Bogomir Horvat

The use of noisy frame elimination and frequency spectrum magnitude reduction in noise robust speech recognition
Damjan Vlaj, Zdravko Kacic, Bogomir Horvat

Combined linear regression adaptation and Bayesian predictive classification for robust speech recognition
Jen-Tzung Chien

Quantile based histogram equalization for noise robust speech recognition
Florian Hilger, Hermann Ney

Sequential noise compensation by a sequential kullback proximal algorithm
Kaisheng Yao, Kuldip K. Paliwal, Satoshi Nakamura








Speech Recognition and Understanding: Adaptation


A novel target-driven MLLR adaptation algorithm with multi-layer structure
Jia Lei, Xu Bo

Scaled likelihood linear regression for hidden Markov model adaptation
Frank Wallhoff, Daniel Willett, Gerhard Rigoll

Fast adaptation using constrained affine transformations with hierarchical priors
Tor Andre Myrvoll, Kuldip K. Paliwal, Torbjørn Svendsen

A context adaptation approach for building context dependent models in LVCSR
Xiaoxing Liu, Baosheng Yuan, Yonghong Yan

Improving genericity for task-independent speech recognition
Fabrice Lefevre, Jean-Luc Gauvain, Lori Lamel

A posteriori and a priori transformations for speaker adaptation in large vocabulary speech recognition systems
Driss Matrouf, Olivier Bellot, Pascal Nocera, Georges Linares, Jean-Francois Bonastre

Bayesian methods for HMM speech recognition with limited training data
Darryl W. Purnell, Elizabeth C. Botha

Rapid speaker adaptation using MLLR and subspace regression classes
Kwok-Man Wong, Brian Mak

Speaker adaptation of output probabilities and state duration distributions for speech recognition
Nestor Becerra Yoma, Jorge Silva

Cohorts based custom models for rapid speaker and dialect adaptation
Jian Wu, Eric Chang

Speaker adaptation of quantized parameter HMMs
Marcel Vasilache, Olli Viikki

Segmental eigenvoice for rapid speaker adaptation
Yu Tsao, Shang-Ming Lee, Fu-Chiang Chou, Lin-Shan Lee

Speaker adaptation in an ASR system based on nonlinear dynamical systems
Narada D. Warakagoda, Magne H. Johnsen


Dialogue Systems: Project Descriptions


An interactive directory assistance service for Spanish with large-vocabulary recognition
R. Córdoba, R. San-Segundo, J. M. Montero, J. Colás, J. Ferreiros, J. Macías-Guarasa, Juan M. Pardo

A multilingual-supporting dialog system using a common dialog controller
Yunbiao Xu, Masahiro Araki, Yasuhisa Niimi

Graphic platform for designing and developing practical voice interaction systems
Tomas Nouza, Jan Nouza

Speech translation for French in the NESPOLE! european project
Laurent Besacier, H. Blanchon, Y. Fouquet, J. P. Guilbaud, S. Helme, S. Mazenot, D. Moraru, D. Vaufreydaz

Lessons from the development of a conversational interface
Marianne Hickey, Paul St John Brittan

SCANMail: browsing and searching speech data by content
Julia Hirschberg, Michiel Bacchiani, Don Hindle, Phil Isenhour, Aaron Rosenberg, Litza Stark, Larry Stead, Steve Whittaker, Gary Zamchick

Multi-scale retrieval in MEI: an English-Chinese translingual speech retrieval system
Wai-Kit Lo, Patrick Schone, Helen M. Meng

Compact word graph in spoken dialogue system
Shih-Chieh Chien, Sen-Chia Chang

MINOS-II: a prototype car navigation system with mixed initiative turn taking dialogue
Munehiko Sasajima, Takebhide Yano, Taishi Shimomori, Tatsuya Uehara

Use of topic knowledge in spoken dialogue information retrieval system for academic documents
Shinya Kiriyama, Keikichi Hirose, Nobuaki Minematsu

Domain-independent spoken dialogue platform using key-phrase spotting based on combined language model
Kazunori Komatani, Katsuaki Tanaka, Hiroaki Kashima, Tatsuya Kawahara

OASIS natural language call steering trial
Peter J. Durston, Mark Farrell, David Attwater, James Allen, Hong-Kwang Jeff Kuo, Mohamed Afify, Eric Fosler-Lussier, Chin-Hui Lee

First steps toward an adaptive spoken dialogue system in medical domain
Ivano Azzini, Daniele Falavigna, Roberto Gretter, Giordano Lanzola, Marco Orlandi

Mokusei: a telephone-based Japanese conversational system in the weather domain
Mikio Nakano, Yasuhiro Minami, Stephanie Seneff, Timothy J. Hazen, D. Scott Cyphers, James Glass, Joseph Polifroni, Victor Zue

Speechbuilder: facilitating spoken dialogue system development
James Glass, Eugene Weinstein

Voice-IF: a mixed-initiative spoken dialogue system for AT&t conference services
M. Rahim, Giuseppe Di Fabbrizio, C. Kamm, Marilyn Walker, A. Pokrovsky, P. Ruscitti, E. Levin, S. Lee, Ann K. Syrdal, K. Schlosser

Smartkom: multimodal communication with a life- like character
Wolfgang Wahlster, Norbert Reithinger, Anselm Blocher

ISIS: a learning system with combined interaction and delegation dialogs
Helen M. Meng, Shuk Fong Chan, Yee Fong Wong, Cheong Chat Chan, Yiu Wing Wong, Tien Ying Fung, Wai Ching Tsui, Ke Chen, Lan Wang, Ting Yao Wu, Xiaolong Li, Tan Lee, Wing Nin Choi, P. C. Ching, Huisheng Chi

Robust language understanding in mipad
Ye-Yi Wang

The WITAS multi-modal dialogue system I
Oliver Lemon, Anne Bracy, Alexander Gruenstein, Stanley Peters

Universalizing speech: notes from the USI project
Stefanie Shriver, Roni Rosenfeld, Xiaojin Zhu, Arthur Toth, Alexander I. Rudnicky, Markus Flueckiger





Speech Production: Miscellaneous


AMSTIVOC (AMsterdam system for transcription of infant VOCalizations) applied to utterances of deaf and normally hearing infants
Florien J. Koopmans-van Beinum, Chris J. Clement, Ineke Van den Dikkenberg-Pot

Using linguopalatal contact patterns to tune a 3d tongue model
Olov Engwall

Electromagnetic articulograph (EMA) based on a nonparametric representation of tthe magnetic field
Tokihiko Kaburagi, Masaaki Honda

European portuguese nasal vowels: an EMMA study
A. Teixeira, F. Vaz

The role of the palate in tongue kinematics: an experimental assessment in v sequences from EPG and EMMA data
Susanne Fuchs, Pascal Perrier, Christine Mooshammer

Modelling care of articulation with HMMs is dangerous
Matthew P. Aylett

Spectral tilt as a perturbation-free measurement of noise levels in voice signals
Peter J. Murphy

Estimation of the modulation frequency and modulation depth of the fundamental frequency owing to vocal micro-tremor of the voice source signal
Jean Schoentgen

The perceptual relevance of glottal-pulse parameter variations
Ralph van Dinther, Raymond N.J. Veldhuis, Armin Kohlrausch

Speaker normalization based on test to reference speaker mapping
Marcel Ogner, Zdravko Kacic

A face-to-muscle inversion of a biomechanical face model for audiovisual and motor control research
Michel Pitermann, Kevin G. Munhall

A model of vowel production under positive pressure breathing
Allan J South

Helium speech normalisation by codebook mapping
Adam Podhorski, Marek Czepulonis





Resources, Assessment and Standards: Assessment Tools & Methodology


A new method for testing communication efficiency and user acceptability of speech communication channels
Sander J. van Wijngaarden, Paula M.T. Smeele, Herman J.M. Steeneken

Phonetic transcriptions in the spoken dutch corpus: how to combine efficiency and good transcription quality
Catia Cucchiarini, Diana Binnenpoorte, Simo Goddijn

A functional approach to speech recognition evaluation
Ben Hutchinson

Instrumental derivation of equipment impairment factors for describing telephone speech codec degradations
Sebastian Möller, Jens Berger

Julius --- an open source real-time large vocabulary recognition engine
Akinobu Lee, Tatsuya Kawahara, Kiyohiro Shikano

Local refinement of phonetic boundaries: a general framework and its application using different transition models
Doroteo Torre Toledano, Luis A. Hernández Gómez

Detection of digital transmission systems for voice quality measurements
Thorsten Ludwig, Ulrich Heute

Automatic segmentation of recorded speech into syllables for speech synthesis
Eric Lewis, Mark Tatham

Phonetic events from the labeling the european portuguese database for speech synthesis, FEUP/IPBDB
João Paulo Teixeira, Diamantino Freitas, Daniela Braga, Maria João Barros, Vagner Latsch

Acoustical and topological experiments for an HMM-based speech segmentation system
Samir Nefti, Olivier Boeffard

TclBLASR: an automatic speech recognition extension for tcl
Qiru Zhou, Jinsong Zheng, Chin-Hui Lee






Speech Recognition and Understanding: Algorithms and Architectures


Classification of transition sounds with application to automatic speech recognition
Zeev Litichever, Dan Chazan

Gaussian subtraction (GS) algorithms for word spotting in continuous speech
Avi Faizakov, Arnon Cohen, Tzur Vaich

Relating frame accuracy with word error in hybrid ANN-HMM ASR
Michael L. Shire

A two-layer lexical tree based beam search in continuous Chinese speech recognition
Guoliang Zhang, Fang Zheng, Wenhu Wu

Automatic labeling and digesting for lecture speech utilizing repeated speech by shift CDP
Yoshiaki Itoh, Kazuyo Tanaka

Improved phoneme-history-dependent search for large-vocabulary continuous-speech recognition
Takaaki Hori, Yoshiaki Noda, Shoichi Matsunaga

Comparison of MFCC and PLP parameterizations in the speaker independent continuous speech recognition task
Josef Psutka, Ludek Müller, Josef V. Psutka

N-best list generation using word and phoneme recognition fusion
Ernest Pusateri, J.M. Van Thong

A one pass semi-dynamic network decoder based on language model network
Dong-Hoon Ahn, Minhwa Chung

Improving automatic speech recognition using tangent distance
W. Macherey, D. Keysers, J. Dahmen, Hermann Ney

N-best speech hypotheses reordering using linear regression
Ananlada Chotimongkol, Alexander I. Rudnicky

Low-resource hidden Markov model speech recognition
Sabine Deligne, Ellen Eide, Ramesh Gopinath, Dimitri Kanevsky, Benoit Maison, Peder Olsen, Harry Printz, Jan Sedivy

Speech recognition at multiple sampling rates
H. G. Hirsch, K. Hellwig, S. Dobler

Support vector machine with dynamic time-alignment kernel for speech recognition
Hiroshi Shimodaira, Ken-ichi Noma, Mitsuru Nakai, Shigeki Sagayama

Efficient scalable speech compression for scalable speech recognition
Naveen Srinivasamurthy, Antonio Ortega, Shrikanth Narayanan


Signal Analysis: Speech Enhancement and Noise Processing


Voice activity detection in noisy environments
J. Stadermann, V. Stahl, G. Rose

An improved wavelet-based speech enhancement system
Hamid Sheikhzadeh, Hamid Reza Abutalebi

Enhancing distributed speech recognition with back- end speech reconstruction
Tenkasi Ramabadran, Jeff Meunier, Mark Jasiuk, Bill Kushner

Implementation effective one-channel noise reduction system
Jiri Tihelka, Pavel Sovka

Efficient speech enhancement by diffusive gain factors (DGF)
Hyoung-Gook Kim, Klaus Obermayer, Mathias Bode, Dietmar Ruwisch

Correction of the voice timbre distortions on telephone network
Gaël Mahé, André Gilloire

Speech enhancement based on IMM with NPHMM
Yunjung Lee, Joohun Lee, Ki Yong Lee, Katsuhiko Shirai

Speech recognition under musical environments using kalman filter and iterative MLLR adaptation
M. Fujimoto, Y. Ariki

Dual channel speech enhancement using coherence function and MDL-based subspace approach in bark domain
Rolf Vetter, Philippe Renevey, Jens Krauss

Entropy based voice activity detection in very noisy conditions
Philippe Renevey, Andrzej Drygajlo

Discrimination between speech and music based on a low frequency modulation feature
Stefan Karnebäck

Credibility proof for speech content and speaker verification by fragile watermarking with consecutive frame-based processing
Yiou-Wen Cheng, Lin-Shan Lee

Map estimation for on-line noise compensation of time trajectories of spectral coefficients
I. Potamitis, Nikos Fakotakis, George Kokkinakis

A new method for speech denoising and robust speech recognition using probabilistic models for clean speech and for noise
Hagai Attias, Li Deng, Alex Acero, John C. Platt





Speech Coding: Advances in Speech Coding


Coding method for successive pitch periods
Ari Heikkinen, Vesa T. Ruoppila, Samuli Pietilä

Objective evaluation of methods for quantization of variable-dimension spectral vectors in WI speech coding
Jani Nurminen, Ari Heikkinen, Jukka Saarinen

Squared error as a measure of phase distortion
Harald Pobloth, W. Bastiaan Kleijn

Non-linear predictive vector quantization of speech
Marcos Faundez-Zanuy

A variable rate hybrid coder based on a synchronized harmonic excitation
Nilantha Katugampala, Ahmet M. Kondoz

A hybrid sub-band sinusoidal coding scheme
M. S. Ho, D. J. Molyneux, B. M. G. Cheetham

Low rate speech coding incorporating simultaneously masked spectrally weighted linear prediction
J. Lukasiak, I. S. Burnett, C. H. Ritz

Narrowband perceptual audio coding: enhancements for speech
Hossein Najaf-Zadeh, Peter Kabal

Techniques for high-quality ACELP coding of wideband speech
B. Bessette, Roch Lefebvre, R. Salami, M. Jelinek, J. Vainio, J. Rotola-Pukkila, H. Mikkola, K. Jarvinen

Wideband ACELP at 16 kb/s with multi-band excitation
Sílvia Pujalte, Asunción Moreno

Wideband speech coding algorithm with application of discrete wavelet transform to upper band
Seung Won Lee, Keun Sung Bae

A switched DPCM/subband coder for pre-echo reduction
S. Satheesh, T. V. Sreenivas

A generalized multistage VQ approach for spectral magnitude quantization
Cagri Özgenc Etemoglu, Vladimir Cuperman

Efficient implementation of ITU-t g.723.1 speech coder for multichannel voice transmission and storage
Sung-Kyo Jung, Young-Cheol Park, Sung-Wan Youn, Kyoung-Tae Kim, Dae-Hee Youn


Resources, Assessment and Standards: Corpora


CU-move : analysis & corpus development for interactive in-vehicle speech systems
John H. L. Hansen, Pongtep Angkititrakul, Jay Plucienkowski, Stephen Gallant, Umit Yapanel, Bryan Pellom, Wayne Ward, Ron Cole

Multimedia data collection of in-car speech communication
Nobuo Kawaguchi, Shigeki Matsubara, Kazuya Takeda, Fumitada Itakura

The u.s. speechdat-car data collection
Peter A. Heeman, David Cole, Andrew Cronk

Word unit based multilingual comparative analysis of text corpora
Géza Németh, Csaba Zainkó

Creating a european English broadcast news transcription corpus and system
Gerhard Backfried, Robert Hecht, Sabine Loots, Norbert Pfannerer, Jürgen Riedler, Christian Schiefer

The nespole! voIP dialogue database
Susanne Burger, Laurent Besacier, Paolo Coletti, Florian Metze, Céline Morel

Design of speech corpus for text-to-speech synthesis
Jindrich Matousek, Josef Psutka, Jiri Kruta

The IFA corpus: a phonemically segmented dutch "open source" speech database
Rob J. J. H. van Son, Diana Binnenpoorte, Henk van den Heuvel, Louis C. W. Pols

African speech technology (AST) telephone speech databases: corpus design and contents
Philippa H. Louw, Justus C. Roux, Elizabeth C. Botha

Speechdat-e: five eastern european speech databases for voice-operated teleservices completed
Henk van den Heuvel, Jerome Boudy, Zsolt Bakcsi, Jan Cernocky, Valery Galunov, Julia Kochanina, Wojciech Majewski, Petr Pollak, Milan Rusko, Jerzy Sadowski, Piotr Staroniewicz, Herbert S. Tropf

Concordancing for parallel spoken language corpora
Dafydd Gibbon, Thorsten Trippel, Serge Sharoff

Large broadcast news and read speech corpora of spoken czech
Josef Psutka, Vlasta Radova, Ludek Müller, Jindrich Matousek, Pavel Ircing, David Graff

Development of Russian lexical databases, corpora and supporting tools for speech products
Serge A. Yablonsky

Constructing a segment database for greek time domain speech synthesis
Stavroula-Evita F. Fotinea, George D. Tambouratzis, George V. Carayannis





Dialogue Systems: Techniques and Strategies


Robust parsing in spoken dialogue systems
Pengju Yan, Fang Zheng, Mingxing Xu

A theme structure method for the ellipsis resolution
Yinfei Huang, Fang Zheng, Yi Su, Fang Li, Wenhu Wu

Deriving document structure from prosodic cues
Martin Haase, Werner Kriechbaum, Gregor Möhler, Gerhard Stenzel

Design of a semantic parser with support to ellipsis resolution in a Chinese spoken language dialogue system
Yi Su, Fang Zheng, Yinfei Huang

Methodology for dialogue design in telephone-based spoken dialogue systems: a Spanish train information system
R. San-Segundo, J. M. Montero, J. Colás, J. Gutiérrez, J. M. Ramos, Juan M. Pardo

Spoken dialogue management as planning and acting under uncertainty
Bo Zhang, Qingsheng Cai, Jianfeng Mao, Eric Chang, Baining Guo

Modeling of conversational strategy for the robot participating in the group conversation
Yosuke Matsusaka, Shinya Fujie, Tetsunori Kobayashi

Supporting the construction of a user model in speech-only interfaces by adding multi-modality
Jacques Terken, Saskia te Riele

A word- and turn-oriented approach to exploring the structure of Mandarin dialogues
Shu-Chuan Tseng

A rule based approach to extraction of topics and dialog acts in a spoken dialog system
Yasuhisa Niimi, Tomoki Oku, Takuya Nishimoto, Masahiro Araki

Agent-based error handling in spoken dialogue systems
Markku Turunen, Jaakko Hakulinen

Iterative implementation of dialogue system modules
Lars Degerstedt, Arne Jönsson

Off-talk - a problem for human-machine-interaction?
Daniela Oppermann, Florian Schiel, Silke Steininger, Nicole Beringer

Automatic analysis of real dialogues and generating of training corpora
Jana Schwarz, Vaclav Matousek

Natural language understanding using statistical machine translation
Klaus Macherey, Franz Josef Och, Hermann Ney

Improvements in audio processing and language modeling in the CU communicator
Jianping Zhang, Wayne Ward, Bryan Pellom, Xiuyang Yu, Kadri Hacioglu

Dialogue session: management using voiceXML
Augustine Tsai, Andrew N. Pargellis, Chin-Hui Lee, Joseph P. Olive

Ambiguity representation and resolution in spoken dialogue systems
Egbert Ammicht, Alexandros Potamianos, Eric Fosler-Lussier

Learning of user formulations for business listings in automatic directory assistance
C. Popovici, M. Andorno, P. Laface, L. Fissore, M. Nigra, C. Vair

Mathematical modeling of spoken human - machine dialogues including erroneous confirmations
D. Louloudis, A. Tsopanoglou, Nikos Fakotakis, George Kokkinakis

Limited enquiry negotiation dialogues
Ian Lewin

A comparison of some different techniques for vector based call-routing
Stephen Cox, Ben Shahshahani

Architecture for adaptive multimodal dialog systems based on voiceXML
Georg Niklfeld, Robert Finan, Michael Pucher


Speech Synthesis: Miscellaneous


Feature extraction by auditory modeling for unit selection in concatenative speech synthesis
Minoru Tsuzaki

Perceptual cost functions for unit searching in large corpus-based text-to-speech
Minkyu Lee

Pruning of redundant synthesis instances based on weighted vector quantization
Sanghun Kim, Youngjik Lee, Keikichi Hirose

Using real words for recording diphones
Susan Fitt

Application of the trended hidden Markov model to speech synthesis
John Dines, Sridha Sridharan, Miles Moody

Two features to check phonetic transcriptions in text to speech systems
Stefano Sandri, Enrico Zovato

Text-to-speech scripting interface for appropriate vocalisation of e-texts
Gerasimos Xydas, Georgios Kouroupetroglou

Representation of large lexica using finite-state transducers for the multilingual text-to-speech synthesis systems
Matej Rojc, Zdravko Kacic

Corpus-based synthesis of fundamental frequency contours based on a generation process model
Keikichi Hirose, Masaya Eto, Nobuaki Minematsu, Atsuhiro Sakurai

Corpus-based database of residual excitations used for speech reconstruction from MFCCs
Zbyn.ek Tychtl, Josef Psutka

Mixed excitation for HMM-based speech synthesis
Takayoshi Yoshimura, Keiichi Tokuda, Takashi Masuko, Takao Kobayashi, Tadashi Kitamura

Aperiodicity control in ARX-based speech analysis-synthesis method
Takahiro Ohtsuka, Hideki Kasuya

Generalized source-filter structures for speech synthesis
Matti Karjalainen, Tuomas Paatero

The speech synthesis environment and parametric modeling of coarticulation
Mikolaj Wypych





Applications: Miscellaneous Applications


Some practical considerations in the deployment of a wireless-communication interactive voice response system
Carmen Garcia-Mateo, Laura Docio-Fernandez, Antonio Cardenal-Lopez

Caller identification for the SCANMail voicemail browser
Aaron Rosenberg, Julia Hirschberg, Michiel Bacchiani, S. Parthasarathy, Philip Isenhour, Larry Stead

Extractive summarization of voicemail using lexical and prosodic feature subset selection
Konstantinos Koumpis, Steve Renals, Mahesan Niranjan

Business listings in automatic directory assistance
Odette Scharenborg, Janienke Sturm, Lou Boves

Eutrans: a speech-to-speech translator prototype
M. Pastor-i-Gadea, A. Sanchis, F. Casacuberta, E. Vidal

Speech recognition over netmeeting connections
Florian Metze, John McDonough, Hagen Soltau

DIARCA: a component approach to voice recognition
Juan C. Díaz Martín, Juan L. García Zapata, José M. Rodríguez García, José F. Álvarez Salgado, Pablo Espada Bueno, Pedro Gómez Vilda

The mvprotek : m-commerce voice verification system
Y. J. Kyung, J. O. Jung, S. M. Sohn, H. J. Chun, S. Y. Moon, M. H. Kim, W. H. Sull

Real-time multilingual communication by means of prestored conversational units
Norman Alm, Mamoru Iwabuchi, Peter N. Andreasen, Kenryu Nakamura, Iain R. Murray

Writing script-based dialogues for AAC
Iain R. Murray, John L. Arnott, Norman Alm, Richard Dye, Gillian Harper

Communication aid for non-vocal people using corpusbased concatenative speech synthesis
Akemi Iida, Yosuke Sakurada, Nick Campbell, Michiaki Yasumura

Social effects on vocal rate with echoic mimicry using prosody-only voice
Noriko Suzuki, Kazuhiko Kakehi, Yugo Takeuchi, Michio Okada

Everyday life sounds and speech analysis for a medical telemonitoring system
Eric Castelli, Dan Istrate

Speaking while driving - preliminary results on spellings in the German speechdat-car database
Christoph Draxler, Klaus Bengler, Christina Olaverri-Monreal


Signal Analysis: Pitch and Speech Analysis


Efficient periodicity extraction based on sine-wave representation and its application to pitch determination of speech signals
Dan Chazan, Meir (Zibulski) Tzur, Ron Hoory, Gilad Cohen

Viseme recognition using multiple feature matching
I. Shdaifat, R. Grigat, Stefan Lütgert

The fundamental frequency of cough by autocorrelation analysis
A. Van Hirtum, D. Berckmans

A fundamental frequency estimation method for noisy speech based on instantaneous amplitude and frequency
Yuichi Ishimoto, Masashi Unoki, Masato Akagi

Robust LP analysis using glottal source HMM with application to high-pitched and noise corrupted speech
Akira Sasou, Kazuyo Tanaka

Fast harmonic estimation using a low resolution pitch for low bit rate harmonic coding
Yong-Soo Choi, Dae-Hee Youn

Comparative evaluation of F0 estimation algorithms
Alain de Cheveigné, Hideki Kawahara

Identification of accent and intonation in sentences for CALL systems
Carlos Toshinori Ishi, Nobuaki Minematsu, Ryuji Nishide, Keikichi Hirose

Systematic F0 glitches around nasal-vowel transitions
Hideki Kawahara, Parham Zolfaghari

Using aerial and geometric features in automatic lip-reading
Jacek C. Wojdel, Leon J. M. Rothkrantz

Inverse filtering of tube models with frequency dependent tube terminations
Karl Schnell, Arild Lacroix

Formant estimation using gammachirp filterbank
Kaïs Ouni, Zied Lachiri, Noureddine Ellouze

Autoregressive time-frequency interpolation in the context of missing data theory for impulsive noise compensation
I. Potamitis, Nikos Fakotakis

Analysis of the voiced speech using the generalized fourier transform with quadratic phase
D. Petrinovic, Vladimir Cuperman


Integration of Phonetic Knowledge in Speech Technology: Is Phonetic Knowledge any use? Panel discussion (Special Session)


From here to utility - melding phonetic insight with speech technology
Steven Greenberg





Signal Analysis: Source Localisation and Beam Forming


A new auditory based microphone array and objective evaluation using e-RASTI
J. L. Sánchez-Bote, J. González-Rodríguez, D. Simón-Zorita

Equivalence between frequency domain blind source separation and frequency domain adaptive null beamformers
Shoko Araki, Shoji Makino, Ryo Mukai, Hiroshi Saruwatari

Separation and dereverberation performance of frequency domain blind source separation for speech in a reverberant environment
Ryo Mukai, Shoko Araki, Shoji Makino

Blind source separation for speech based on fast-convergence algorithm with ICA and beamforming
Hiroshi Saruwatari, Toshiya Kawamura, Kiyohiro Shikano

Noise reduction using paired-microphones for both far-field and near-field sound sources
Mitsunori Mizumachi, Satoshi Nakamura

Statistical sound source identification in a real acoustic environment for robust speech recognition using a microphone array
Takanobu Nishiura, Satoshi Nakamura, Kiyohiro Shikano

Speech enhancement and source separation based on binaural negative beamforming
A. Álvarez-Marquina, P. Gómez-Vilda, R. Martínez-Olalla, V. Nieto-Lluís, V. Rodellar-Biarge

Multiple source separation in the frequency domain using negative beamforming
P. Gómez-Vilda, A. Álvarez-Marquina, V. Nieto-Lluís, V. Rodellar-Biarge, R. Martínez-Olalla

Planar superdirective microphone arrays for speech acquisition in the car
Rainer Martin, Alexey Petrovsky, Thomas Lotter

Is speech data clustered? - statistical analysis of cepstral features
Tomi Kinnunen, Ismo Kärkkäinen, Pasi Fränti

Maximum likelihood adaptation for distant speech recognition of stationary and moving speakers in reverberant environments
George Nokas, Evangelos Dermatas, George Kokkinakis

Model-based blind estimation of reverberation time: application to robust ASR in reverberant environments
Laurent Couvreur, Christophe Ris, Christophe Couvreur

Using the modulation complex wavelet transform for feature extraction in automatic speech recognition
Yasunori Momomura, Kenji Okada, Takayuki Arai, Noboru Kanedera, Yuji Murahara

Separating three simultaneous speeches with two microphones by integrating auditory and visual processing
Hiroshi G. Okuno, Kazuhiro Nakadai, Tino Lourens, Hiroaki Kitano






Speech Recognition and Understanding: Prosody and Cross-Language in ASR


Experiments on cross-language acoustic modeling
Tanja Schultz, Alex Waibel

Crosslingual speech recognition with multilingual acoustic models based on agglomerative and tree-based triphone clustering
Andrej Zgank, Bojan Imperl, Finn Tore Johansen, Zdravko Kacic, Bogomir Horvat

Comparing parameter tying methods for multilingual acoustic modelling
Mikko Harju, Petri Salmela, Jussi Leppänen, Olli Viikki, Jukka Saarinen

Accent-independent universal HMM-based speech recognizer for american, australian and british English
Rathi Chengalvarayan

The effect of time stress on automatic speech recognition accuracy when using second language
Fang Chen, Jonas Sääv

The effect of pitch and lexical tone on different Mandarin speech recognition tasks
Yiu Wing Wong, Eric Chang

Acoustic modeling of foreign words in a German speech recognition system
Georg Stemmer, Elmar Nöth, Heinrich Niemann

Semi-automatic grammar induction for bi-directional English-Chinese machine translation
K. C. Siu, Helen M. Meng

F0 feature extraction by polynomial regression function for monosyllabic Thai tone recognition
Patavee Charnvivit, Somchai Jitapunkul, Visarut Ahkuputra, Ekkarit Maneenoi, Umavasee Thathong, Boonchai Thampanitchawong

The use of prosody in a combined system for punctuation generation and speech recognition
Ji-Hwan Kim, P. C. Woodland

Lexical stress modeling for improved speech recognition of spontaneous telephone speech in the jupiter domain
Chao Wang, Stephanie Seneff

Modeling auxiliary information in Bayesian network based ASR
Todd A. Stephenson, M. Mathew, Herve Bourlard

A new dynamic HMM model for speech recognition
Feili Chen, Eric Chang

Multi-keyword spotting of telephone speech using orthogonal transform-based SBR and RNN prosodic model
Wern-Jun Wang, Chun-Jen Lee, Eng-Fong Huang, Sin-Horng Chen

Recognition of slovenian speech: within and cross-language experiments on monophones using the speechdat(II)
Andrej Iskra, Bojan Petek, Tom Brøndsted

Boiling down prosody for the classification of boundaries and accents in German and English
Anton Batliner, Jan Buckow, Richard Huber, Volker Warnke, Elmar Nöth, Heinrich Niemann




×

Keynotes

What do Industry and Universities Expect from Each Other? (Special Session)

Linguistic Modelling: Language Model Compression

Speech Production: Voice Source

Speech Recognition and Understanding: Pronunciation and Subword Units

Phonetics and Phonology: Prosody and Others

Speech Perception: First and Second Language Learning

Speech Perception: Miscellaneous

Noise Robust Recognition: Frontend and Compensation Algorithms (Special Session)

Linguistic Modelling: Language Model Adaptation

Speech Production: Articulation

Speech Recognition and Understanding: Topic Detection and Information Retrieval

Phonetics and Phonology: Segmentals and Synthesis

Noise Robust Recognition: Frontend (Special Session)

Linguistic Modelling: Semantic Modelling

Speech Perception: Recognition and Intelligibility

Speech Recognition and Understanding: LVCSR

Speech Synthesis: Systems and Prosody

Speech Recognition and Understanding: Articulatory and Perceptual Approaches to ASR

Noise Robust Recognition: Robust Systems - What Helps? (Special Session)

Phonetics and Phonology: Segmentals

Speech Production: Prosody

Speech Recognition and Understanding: Acoustic Modelling - I

Linguistic Modelling: Language Models

Speaker Recognition: Identification, Verification and Tracking. Speech Recognition and Understanding: Language Identification

Phonetics and Phonology: Prominence and Timing

Speech Synthesis: Concatenation

Speech Recognition and Understanding: Noise Robustness

Signal Analysis: Microphone Arrays & Source Localisation

Speech Recognition and Understanding: Audio-Visual Processing

SIGshow (Special Session)

Speech Synthesis: Prosody

Applications: Multimodal Applications

Speech Recognition and Understanding: Speaker Adaptation

Speech Recognition and Understanding: Adaptation

Dialogue Systems: Project Descriptions

Dialogue Systems: Resources

Speaker Recognition: Features and Transforms

Speech Perception: Prosody

Speech Production: Miscellaneous

Existing and Future Corpora: Next Generation Speech Resources (Special Session)

Signal Analysis: Speech Processing in Car Environments

Speech Recognition and Understanding: Finite State Transducers for ASR

Resources, Assessment and Standards: Assessment Tools & Methodology

Existing and Future Corpora: Automated Analysis of Speech Resources (Special Session)

Dialogue Systems: Dialogue Systems and Generation

Speaker Recognition: Alternative Trends in Verification

Speech Recognition and Understanding: Speech Understanding

Speech Recognition and Understanding: Algorithms and Architectures

Signal Analysis: Speech Enhancement and Noise Processing

Speech Synthesis: Grapheme-to-Phoneme Conversion

Signal Analysis: Speech Enhancement

Speech Recognition and Understanding: Discriminative Training

Speech Coding: Advances in Speech Coding

Resources, Assessment and Standards: Corpora

Resources, Assessment and Standards: Assessment Methodology

Speech Recognition and Understanding: Confidence Measures

Speech Recognition and Understanding: Language Modelling

Dialogue Systems: Techniques and Strategies

Speech Synthesis: Miscellaneous

Integration of Phonetic Knowledge in Speech Technology: Experiments and Experiences (Special Session)

Speech Coding: Wideband Speech Coding

Speech Recognition and Understanding: Robust ASR

Applications: Miscellaneous Applications

Signal Analysis: Pitch and Speech Analysis

Integration of Phonetic Knowledge in Speech Technology: Is Phonetic Knowledge any use? Panel discussion (Special Session)

Speech Coding: Speech Transmission Systems

Speech Recognition and Understanding: Rhythm and Timing in ASR

Speech Recognition and Understanding: Confidence Measures and OOV

Signal Analysis: Source Localisation and Beam Forming

Signal Analysis: Speech Features and Modelling

Speech Recognition and Understanding: Kids, Toys and Emotions

Applications: Media Applications

Speech Recognition and Understanding: Distributed Speech Recognition

Speech Recognition and Understanding: Prosody and Cross-Language in ASR

Education: Education and Training

Speaker Recognition: Features and Robustness