ISCA - International Speech Communication Association
ISCA Archive
Bonjour à tous,
Je vous informe par ce mail que le logiciel de synthèse articulatoire VocalTractLab3D est maintenant en ligne et à la disposition de tous librement:
https://vocaltractlab.de/index.php?page=vocaltractlab-download
VocalTractLab est un logiciel de synthèse articulatoire développé principalement par Peter Birkholz à la chaire de technologies de la parole et systèmes cognitifs de l’université de Dresde.
Au cours de mon postdoc j’ai travaillé à développer une version spéciale, VocalTractLab3D, qui inclut des simulations acoustiques 3Ds efficaces pilotées par une interface graphique.
Contrairement aux simulations acoustiques couramment utilisées dans l’étude de la parole, qui reposent sur une approche 1D basée sur la function d’aire, les simulations 3Ds décrivent le champ acoustique dans toutes les dimensions de l’espace et prennent en compte la forme 3D précise du conduit vocal.
Elles sont de fait plus précises, en particulier en haute fréquence (à partir d’environ 2-3 kHz).
Leur limitation est cependant le temps de calcul. Dans notre projet nous avons travaillé à repousser cette limite et notre logiciel permet de réaliser ce type de simulation dans un temps raisonnable (environ 1 heure pour une géométrie statique avec une solution précise).
Une autre limitation courante des simulations 3Ds est la nécessité de maitriser des methods de simulations assez techniques tells que les éléments finis, différences finies ou autres. Cela passe souvent par l’utilisation d’un language de programmation.
Nous avons également travaillé à repousser cette limite pour rendre accessible ce type de simulation au plus grand nombre: dans VocalTractLab3D ces simulations sont pilotées par une interface graphique et il n’est pas nécessaire de comprendre exactement comment fonctionne la méthode pour pouvoir calculer des fonctions de transfert ou des champs acoustiques.
Si suffisamment de personnes sont intéressées, je peux faire une présentation en ligne du logiciel, pour expliquer plus en detail en quoi il consiste, à quoi il peut servir et comment l’utiliser.
Ecrivez-moi si cela vous intéresse.
N’hésitez pas également à me contacter si vous avez des questions par rapport à ce logiciel.
Bien à vous,
Rémi Blandin
Chères et chers collègues,
Nous sommes heureux d'annoncer la mise à disposition du public de la
première bibliothèque en langage Python pour convertir des nombres écrits en
français en leur représentation en chiffres.
L'analyseur est robuste et est capable de segmenter et substituer les expressions
de nombre dans un flux de mots, comme une conversation par exemple. Il reconnaît les différentes
variantes de la langue (quantre-vingt-dix / nonante?) et traduit aussi bien les
ordinaux que les entiers, les nombres décimaux et les séquences formelles (n° de téléphone, CB?).
Nous espérons que cet outil sera utile à celles et ceux qui, comme nous, font du traitment
du langage naturel en français.
Cette bibliothèque est diffusée sous license MIT qui permet une utilisation très libre.
Pypi : https://pypi.org/project/text2num/
Sources : https://github.com/allo-media/text2num
Doc : http://text2num.readthedocs.io/
--
Romuald Texier-Marcadé
http://www.allo-media.fr
Dear all,
LIG is pleased to inform you that the website for the app Lig-Aikuma is online: https://lig-aikuma.imag.fr/ In the same time, an update of Lig-Aikuma (V3) was made available (see website).
LIG is pleased to inform you that the website for the app Lig-Aikuma is online: https://lig-aikuma.imag.fr/
In the same time, an update of Lig-Aikuma (V3) was made available (see website).
LIG-AIKUMA is a free Android app running on various mobile phones and tablets. The app proposes a range of different speech collection modes (recording, respeaking, translation and elicitation) and offers the possibility to share recordings between users. LIG-AIKUMA is built upon the initial AIKUMA app developed by S. Bird & F. Hanke (see https://en.wikipedia.org/wiki/Aikuma for more information)
Improvements of the app: Visual upgrade: + Waveform visualizer on the Respeaking and Translation modes (possibility to zoom in/out the audio signal) + File explorer included in all modes, to facilitate the navigation between files + New Share mode to share recordings between devices (by Bluetooth, Mail, NFC if available) + French and German languages available. In addition to English, the application now supports French and German languages. Lig-Aikuma uses by default the language of the phone/tablet. + New icons, more consistent to discriminate all type of files (audio, text, image, video) Conceptual upgrade: + New name for the root project: ligaikuma ?> /! Henceforth, all data will be stored into this directory instead of ?aikuma? (in the previous versions of the app). This change doesn?t have compatibility issues. In the file explorer of the mode, the default position is this root directory. Just go back once with the left grey arrow (on the lower left of the screen) and select the ?aikuma? directory to access to your old recordings + Generation of a PDF consent form (from informations filled in the metadata form) that can be signed by linguist and speaker thanks to a pdf annotation tool (like Adobe Fill & Sign mobile app) + Generation of a CSV file which can be imported in Elan software: it will automatically create segmented tier, as it was done during a respeaking or a translation session. It will also mention by a ?non-speech? label that a segment has no speech. + Géolocalisation of the recordings + Respeak an elicit file: it is now possible to use in Respeaking or Translation mode an audio file initially recorded in Elicitation mode Structural upgrade: + Undo button on Elicitation to erase/redo the current recording + Improvement session backup on Elicitation + Non-speech button in Respeaking and Translation modes to indicate by a comment that the segment does not contain speech (but noise or silent for instance) + Automatic speaker profile creation to quickly fill in the metadata infos if several sessions with a same speaker Best regards, Elodie Gauthier & Laurent Besacier
Improvements of the app:
Best regards, Elodie Gauthier & Laurent Besacier
Clickable map - Illustrations of the IPA
It is our pleasure to introduce A||GO (https://allgo.inria.fr/ or http://allgo.irisa.fr/), a platform providing a collection of web-services for the automatic analysis of various data, including multimedia content across modalities. The platform builds on the back-end web service deployment infrastructure developed and maintained by Inria?s Service for Experimentation and Development (SED). Originally dedicated to multimedia content, A||GO progressively broadened to other fields such as computational biology, networks and telecommunications, computational graphics or computational physics.
As part of the CNRS PlaSciDo initiative [1], the Linkmedia team at IRISA / Inria Rennes is making available via A||GO a number of web services devoted to multimedia content analysis across modalities (language, audio, image, video). The web services provided currently include research results from the Linkmedia team as well as contribution from a number of partners. A list of the services available by the date is given below and the current state is available at https://www-linkmedia.irisa.fr/software along with demo videos. Most web services are interoperable, facilitating the implementation of a multimedia content analysis processing chain, and are free to use for trial, prototyping or lab work. A brief and free account creation step will allow you to execute the web-services using either the graphical interface or a command line via a dedicated API. We expect the number of web services to grow over time and invite interested parties to contact us should they wish to contribute the multimedia web service offer of A||GO. List of multimedia content analysis tools currently available on A||GO: - Audio Processing SaMuSa: music/speech segmentation SilAD: silence detection Radi.sh: repeated audio motif discovery LORIA STS v2: speech transcription for the French language from LORIA Multi channel BSS locate: audio source localization toolbox from IRISA-PANAMA A-spade: audio declipper from IRISA-PANAMA Transvox: voice faker from LORIA - Natural Language Processing NERO: name entity recognition TermEx: keywords/indexing terms detection Otis!: topic segmentation Hi-tost: hierarchical topic structuring - Video Processing Vidseg: video shot segmentation HUFA: face detection and tracking Shortcuts to Linkmedia services are also available here: We expect the number of web services to grow over time and invite interested parties to contact us should they wish to contribute the multimedia web service offer of A||GO. List of multimedia content analysis tools currently available on A||GO: - Audio Processing SaMuSa: music/speech segmentation SilAD: silence detection Radi.sh: repeated audio motif discovery LORIA STS v2: speech transcription for the French language from LORIA Multi channel BSS locate: audio source localization toolbox from IRISA-PANAMA A-spade: audio declipper from IRISA-PANAMA Transvox: voice faker from LORIA- Natural Language Processing NERO: name entity recognition TermEx: keywords/indexing terms detection Otis!: topic segmentation Hi-tost: hierarchical topic structuring - Video Processing Vidseg: video shot segmentation HUFA: face detection and tracking Shortcuts to Linkmedia services are also available here: https://www-linkmedia.irisa.fr/software/ For more information don't hesitate to contact us (contact-multimedia-allgo@irisa.fr). Gabriel Sargent and Guillaume Gravier -- Linkmedia IRISA - CNRS Rennes, France Gabriel Sargent and Guillaume Gravier-- Linkmedia IRISA - CNRS Rennes, France
We are pleased to announce the release of LIG_AIKUMA, an android application for speech data collection, specially dedicated to language documentation. LIG_AIKUMA is an improved version of the Android application (AIKUMA) initially developed by Steven Bird and colleagues. Features were added to the app in order to facilitate the collection of parallel speech data in line with the requirements of a French-German project (ANR/DFG BULB - Breaking the Unwritten Language Barrier).
The resulting app, called LIG-AIKUMA, runs on various mobile phones and tablets and proposes a range of different speech collection modes (recording, respeaking, translation and elicitation). It was used for field data collections in Congo-Brazzaville resulting in a total of over 80 hours of speech.
Users who just want to use the app without access to the code can download it directly from the forge direct link: https://forge.imag.fr/frs/download.php/706/MainActivity.apk
Code is also available on demand (contact elodie.gauthier@imag.fr and laurent.besacier@imag.fr).
More details on LIG_AIKUMA can be found on the following paper: http://www.sciencedirect.com/science/article/pii/S1877050916300448
We are happy to announce the release of our new toolkit “MultiVec” for computing continuous representations for text at different granularity levels (word-level or sequences of words). MultiVec includes Mikolov et al. [2013b]’s word2vec features, Le and Mikolov [2014]’s paragraph vector (batch and online) and Luong et al. [2015]’s model for bilingual distributed representations. MultiVec also includes different distance measures between words and sequences of words. The toolkit is written in C++ and is aimed at being fast (in the same order of magnitude as word2vec), easy to use, and easy to extend. It has been evaluated on several NLP tasks: the analogical reasoning task, sentiment analysis, and crosslingual document classification. The toolkit also includes C++ and Python libraries, that you can use to query bilingual and monolingual models.
The project is fully open to future contributions. The code is provided on the project webpage (https://github.com/eske/multivec) with installation instructions and command-line usage examples.
When you use this toolkit, please cite:
@InProceedings{MultiVecLREC2016,
Title = {{MultiVec: a Multilingual and MultiLevel Representation Learning Toolkit for NLP}},
Author = {Alexandre Bérard and Christophe Servan and Olivier Pietquin and Laurent Besacier},
Booktitle = {The 10th edition of the Language Resources and Evaluation Conference (LREC 2016)},
Year = {2016},
Month = {May}
}
The paper is available here: https://github.com/eske/multivec/raw/master/docs/Berard_and_al-MultiVec_a_Multilingual_and_Multilevel_Representation_Learning_Toolkit_for_NLP-LREC2016.pdf
Best regards,
Alexandre Bérard, Christophe Servan, Olivier Pietquin and Laurent Besacier
We are glad to announce the public realease of the Cantor Digitalis, an open-source real-time singing synthesizer controlled by hand gestures.
It can be used e.g. for making music or for singing voice pedagogy. A wide variety of voices are available, from the classic vocal quartet (soprano, alto, tenor, bass), to the extreme colors of childish, breathy, roaring, etc. voices. All the features of vocal sounds are entirely under control, as the synthesis method is based on a mathematic model of voice production, without prerecording segments. The instrument is controlled using chironomy, i.e. hand gestures, with the help of interfaces like stylus or fingers on a graphic tablet, or computer mouse. Vocal dimensions such as the melody, vocal effort, vowel, voice tension, vocal tract size, breathiness etc. can easily and continuouslybe controlled during performance, and special voices can be prepared in advance or using presets. Check out the capabilities of Cantor Digitalis, through performances extracts from the ensemble Chorus Digitalis: http://youtu.be/_LTjM3Lihis?t=13s. In pratice, this release provides:
Regards,
The Cantor Digitalis team (who loves feedback — cantordigitalis@limsi.fr) Christophe d'Alessandro, Lionel Feugère, Olivier Perrotin http://cantordigitalis.limsi.fr/
© Copyright 2024 - ISCA International Speech Communication Association - All right reserved.