Odyssey 2018 The Speaker and Language Recognition Workshop

26-29 June 2018, Les Sables d'Olonne, France

Chairs: Anthony Larcher and Jean-François Bonastre

ISSN: 2312-2846
DOI: 10.21437/Odyssey.2018

Keynote: Els Kindt


Speaker identification and Data protection
Els Kindt


Speaker Recognition I


Impact of rhythm on forensic voice comparison reliability
Moez Ajili, Solange Rossato, Dan Zhang, Jean-François Bonastre

Segmental Content Effects on Text-dependent Automatic Accent Recognition
Georgina Brown

Homomorphic Encryption for Speaker Recognition: Protection of Biometric Templates and Vendor Model Parameters
Andreas Nautsch, Sergey Isadskiy, Jascha Kolberg, Marta Gomez-Barrero, Christoph Busch

Weakly Supervised Training of Speaker Identification Models
Martin Karu, Tanel Alumäe


Language Recognition


The LEAP Language Recognition System for LRE 2017 Challenge - Improvements and Error Analysis
Bharat Padi, Shreyas Ramoji, Vaishnavi Yeruva, Satish Kumar, Sriram Ganapathy

Analysis of DNN-based Embeddings for Language Recognition on the NIST LRE 2017
Alicia Lozano-Diez, Oldrich Plchot, Pavel Matejka, Ondrej Novotny, Joaquin Gonzalez-Rodriguez

Analysis of BUT-PT Submission for NIST LRE 2017
Oldřich Plchot, Pavel Matějka, Ondřej Novotný, Sandro Cumani, Alicia Lozano-Diez, Josef Slavíček, Mireia Diez, František Grézl, Ondřej Glembek, Mounika Kamsali, Anna Silnova, Lukáš Burget, Lucas Ondel, Santosh Kesiraju, Johan Rohdin

The MIT Lincoln Laboratory / JHU / EPITA-LSE LRE17 System
Fred Richardson, Pedro Torres-Carrasquillo, Jonas Borgstrom, Douglas Sturim, Youngjune Gwon, Jesus Villalba, Jan Trmal, Nanxin Chen, Reda Dehak, Najim Dehak

Staircase Network: structural language identification via hierarchical attentive units
Trung Ngo Trong, Ville Hautamaki, Kristiina Jokinen

Language Recognition for Telephone and Video Speech: The JHU HLTCOE Submission for NIST LRE17
Alan Mccree, David Snyder, Greg Sell, Daniel Garcia-Romero

Exploring the Encoding Layer and Loss Function in End-to-End Speaker and Language Recognition System
Weicheng Cai, Jinkun Chen, Ming Li

The 2017 NIST Language Recognition Evaluation
Seyed Omid Sadjadi, Timothee Kheyrkhah, Audrey Tong, Craig Greenberg, Douglas Reynolds, Elliot Singer, Lisa Mason, Jaime Hernandez-Cordero

Approaches to Multi-domain Language Recognition
Mitchell Mclaren, Mahesh Kumar Nandwana, Diego Castán, Luciana Ferrer

Convolutional Neural Network and Language Embeddings for End-to-End Dialect Recognition
Suwon Shon, Ahmed Ali, James Glass

Spoken Language Recognition using X-vectors
David Snyder, Daniel Garcia-Romero, Alan McCree, Gregory Sell, Daniel Povey, Sanjeev Khudanpur

End-to-End versus Embedding Neural Networks for Language Recognition in Mismatched Conditions
Jesus Antonio Villalba Lopez, Niko Brummer, Najim Dehak


Speaker diarization


Incremental On-Line Clustering of Speakers' Short Segments
Ruth Aloni-Lavi, Irit Opher, Itshak Lapidot

Latent Class Model for Single Channel Speaker Diarization
Liang He, Xianhong Chen, Can Xu, Jia Liu

VB-HMM Speaker Diarization with Enhanced and Refined Segment Representation
Xianhong Chen, Liang He, Can Xu, Yi Liu, Tianyu Liang, Jia Liu

Low-latency speaker spotting with online diarization and detection
Jose Patino, Ruiqing Yin, Héctor Delgado, Hervé Bredin, Alain Komaty, Guillaume Wisniewski, Claude Barras, Nicholas Evans, Sébastien Marcel

Speaker Diarization based on Bayesian HMM with Eigenvoice Priors
Mireia Diez, Lukas Burget, Pavel Matejka


Noise Robustness


Domain-invariant I-vector Feature Extraction for PLDA Speaker Verification
Md Hafizur Rahman, Ivan Himawan, David Dean, Clinton Fookes, Sridha Sridharan

Reducing Domain Mismatch by Maximum Mean Discrepancy Based Autoencoders
Weiwei Lin, Man-Wai Mak, Longxin Li, Jen-Tzung Chien

On the use of X-vectors for Robust Speaker Recognition
Ondřej Novotný, Oldřich Plchot, Pavel Matějka, Ladislav Mošner, Ondřej Glembek

Speaker Verification in Mismatched Conditions with Frustratingly Easy Domain Adaptation
Md Jahangir Alam, Gautam Bhattacharya, Patrick Kenny

An Analysis of Transfer Learning for Domain Mismatched Text-independent Speaker Verification
Chunlei Zhang, Shivesh Ranjan, John Hansen


Keynote: Simon King


Speaking naturally? It depends who is listening.
Simoin King


Voice conversion


A Spoofing Benchmark for the 2018 Voice Conversion Challenge: Leveraging from Spoofing Countermeasures for Speech Artifact Assessment
Tomi Kinnunen, Jaime Lorenzo-Trueba, Junichi Yamagishi, Tomoki Toda, Daisuke Saito, Fernando Villavicencio, Zhenhua Ling

The Voice Conversion Challenge 2018: Promoting Development of Parallel and Nonparallel Methods
Jaime Lorenzo-Trueba, Junichi Yamagishi, Tomoki Toda, Daisuke Saito, Fernando Villavicencio, Tomi Kinnunen, Zhenhua Ling

sprocket: Open-Source Voice Conversion Software
Kazuhiro Kobayashi, Tomoki Toda


Voice conversion and spoofing


The NU Non-Parallel Voice Conversion System for the Voice Conversion Challenge 2018
Yichiao Wu, Patrick Lumban Tobing, Tomoki Hayashi, Kazuhiro Kobayashi, Tomoki Toda

NU Voice Conversion System for the Voice Conversion Challenge 2018
Patrick Lumban Tobing, Yichiao Wu, Tomoki Hayashi, Kazuhiro Kobayashi, Tomoki Toda

Average Modeling Approach to Voice Conversion with Non-Parallel Data
Xiaohai Tian, Junchao Wang, Haihua Xu, Eng-Siong Chng, Haizhou Li

Voice liveness detection using phoneme-based pop-noise detector for speaker verification
Shihono Mochizuki, Sayaka Shiota, Hitoshi Kiya

Can we steal your vocal identity from the Internet?: Initial investigation of cloning Obama’s voice using GAN, WaveNet and low-quality found data
Jaime Lorenzo-Trueba, Fuming Fang, Xin Wang, Isao Echizen, Junichi Yamagishi, Tomi Kinnunen

The HCCL-CUHK System for the Voice Conversion Challenge 2018
Songxiang Liu, Lifa Sun, Xixin Wu, Xunying Liu, Helen Meng

Convolutional Neural Network Based Speaker De-Identification
Fahimeh Bahmaninezhad, Chunlei Zhang, John Hansen

Bidirectional Voice Conversion Based on Joint Training Using Gaussian-Gaussian Deep Relational Model
Kentaro Sone, Shinji Takaki, Toru Nakashika

Phonetically Aware Exemplar-Based Prosody Transformation
Berrak Sisman, Grandee Lee, Haizhou Li

A Regression Model of Recurrent Deep Neural Networks for Noise Robust Estimation of the Fundamental Frequency Contour of Speech
Akihiro Kato, Tomi Kinnunen

BUT/Phonexia Bottleneck Feature Extractor
Anna Silnova, Pavel Matejka, Ondrej Glembek, Oldrich Plchot, Ondrej Novotny, Frantisek Grezl, Petr Schwarz, Lukas Burget, Jan Cernocky


Spoofing


An end-to-end spoofing countermeasure for automatic speaker verification using evolving recurrent neural networks
Giacomo Valenti, Héctor Delgado, Massimiliano Todisco, Nicholas Evans, Laurent Pilati

ASVspoof 2017 Version 2.0: meta-data analysis and baseline enhancements
Héctor Delgado, Massimiliano Todisco, Md Sahidullah, Nicholas Evans, Tomi Kinnunen, Kong Aik Lee, Junichi Yamagishi

An Audio Fingerprinting Approach to Replay Attack Detection on ASVSPOOF 2017 Challenge Data
Joaquin Gonzalez-Rodriguez, Alvaro Escudero, Diego de Benito-Gorrón, Beltran Labrador, Javier Franco-Pedroso

t-DCF: a Detection Cost Function for the Tandem Assessment of Spoofing Countermeasures and Automatic Speaker Verification
Tomi Kinnunen, Kong Aik Lee, Hector Delgado, Nicholas Evans, Massimiliano Todisco, Md Sahidullah, Junichi Yamagishi, Douglas A. Reynolds

Perceptual Evaluation of the Effectiveness of Voice Disguise by Age Modification
Rosa Gonzalez Hautamäki, Anssi Kanervisto, Ville Hautamaki, Tomi Kinnunen


Keynote: Pascal Belin


A Vocal Brain: Cerebral Processing of Voice Information
Pascal Belin


Speaker recognition II


How to train your speaker embeddings extractor
Mitchell Mclaren, Diego Castán, Mahesh Kumar Nandwana, Luciana Ferrer, Emre Yilmaz

End-to-end automatic speaker verification with evolving recurrent neural networks
Giacomo Valenti, Adrien Daniel, Nicholas Evans

Adversarial Learning and Augmentation for Speaker Recognition
Jen-Tzung Chien, Kang-Ting Peng

Gaussian meta-embeddings for efficient scoring of a heavy-tailed PLDA model
Niko Brummer, Anna Silnova, Lukas Burget, Themos Stafylakis

Supervector Compression Strategies to Speed up I-Vector System Development
Ville Vestman, Tomi Kinnunen


Text-dependent speaker recognition


A Double Joint Bayesian Approach for J-Vector Based Text-dependent Speaker Verification
Ziqiang Shi, Mengjiao Wang, Liu Liu, Huibin Lin, Rujie Liu

Spoken Pass-Phrase Verification in the i-vector Space
Hossein Zeinali, Lukas Burget, Hossein Sameti, Honza Cernocky

On deep speaker embeddings for text-independent speaker recognition
Sergey Novoselov, Andrey Shulipa, Ivan Kremnev, Alexandr Kozlov, Vadim Shchemelinin

DeepMine Speech Processing Database: Text-Dependent and Independent Speaker Verification and Speech Recognition in Persian and English
Hossein Zeinali, Hossein Sameti, Themos Stafylakis

Boosting the Performance of Spoofing Detection Systems on Replay Attacks Using q-Logarithm Domain Feature Normalization
Md Jahangir Alam, Gautam Bhattacharya, Patrick Kenny