ISCA - International Speech
Communication Association


  • Home
  • Post a New Job Offer
<< First  < Prev   1   2   3   Next >  Last >> 
  • 2025-02-09 12:47 | Anonymous member

    Large language models(LLMs) have demonstrated increasingly powerful capabilities for reasoning tasks, especially in text. The project aims to explore and advance these capabilities in reasoning across multiple data modalities, including but not limited to text, speech and audio. The integration of multiple modalities can lead to more robust and general systems capable of understading and reasoning about the world in a more human-like manner. The project will involve fine-tuning pre-trained models and developing self-supervised learning techniques to adapt LLMs for multimodal tasks.

    Application deadline: 16 March 2025. 

    Apply here


  • 2025-02-07 10:32 | Anonymous member

    Faculty of Information Technology and Communication (ITC) at Tampere University, together with ELLIS Institute Finland, is looking for candidates to several positions in the Artificial Intelligence research and other fields applying AI:

     

    Assistant/Associate/Full Professor

     

    These PI positions are joint offered by Tampere University (https://www.tuni.fi/en/about-us/tampere-university) and ELLIS Institute Finland, which is a newly established world-class research hub in AI and Machine Learning. The call will close on 9 March 2025.

     

    For more information about the requirements and how to apply at ELLIS institute web page https://www.ellisinstitute.fi/PI-recruit. The positions and their requirements are also described below.

     

    1) Assistant/Associate/Full Professor in Computing Science (Fundamental AI) 

     

    The Faculty of Information Technology and Communication Sciences at Tampere University invites applications for positions at all Professor levels in the field of theoretical Machine Learning. We are looking for outstanding machine learning scientists with a strong research track record in fields such as theoretical foundations of Machine Learning, computational learning theory, model training and optimization, efficient, interpretable, trustworthy and scalable Machine Learning, and foundation models in Machine Learning (uni- and multi-model FMs, diffusion and generative models).

     

    2) Assistant/Associate/Full Professor in Computing Science (Applied AI)

     

    The Faculty of Information Technology and Communication Sciences at Tampere University invites applications for positions at all Professor levels in the across the field of Computer Sciences. We are looking for outstanding computing science scientists with a strong research track record in, for example, Computer Engineering, Human–Computer Interaction, Network and Information Security, Signal Processing, or Software Engineering, including human-centered AI foci such as AI ethics.

     

    3) Assistant/Associate/Full Professor in Artificial Intelligence for Communications and Networking

     

    The Faculty of Information Technology and Communication Sciences at Tampere University invites applications for positions at all Professor levels in the field of Electrical Engineering, especially in Communications and Networking. We are looking for outstanding scientists with a strong research track record in, for example, AI native 6G networks, semantic communications and machine learning, or Artificial Intelligence / Machine Learning for physical layer technology, radio resource management and mobility management, or wireless sensing and positioning.

     

    4) Assistant/Associate/Full Professor in AI in Media

     

    The Faculty of Information Technology and Communication Sciences at Tampere University invites applications for positions at all Professor levels in Artificial Intelligence in media. The position will be situated either in Computing Sciences Unit, Communications Sciences Unit, or Languages Unit. We are looking for outstanding scientists with a strong research track record in AI, for example, in Multimedia (e.g. audio, video), Multimodality, , Journalism, Linguistics, Performing Arts, Extended Reality, Gamification, or Games.

     

    Requirements and Process

     
    Applications are welcomed to all tenure track levels (Assistant Professor, Associate Professor and Professor). You should have a doctoral degree in an applicable field and have experience in research in academia in the respective field or fields. You regularly publish at top tier venues in AI-driven fields. Further requirements for each tenure track level are as follows:

     

    Assistant Professor

    • applicable doctoral degree, 
    • ability to undertake independent scholarly activity and potential to pursue scholarly activity at a high international level of excellence, 
    • teaching skills required to successfully perform the duties and functions of the position. 


    Associate Professor

    • applicable doctoral degree 
    • track record of independent scholarly activity 
    • teaching skills required to successfully perform the duties and functions of the position 
    • ability to lead a research group and acquire external funding 
    • track record of international scholarly activity.

     

    Professor

    • applicable doctoral degree 
    • high-level international scholarly expertise 
    • experience of leading scientific research 
    • ability to provide high-quality research-based education and instruction 
    • track record of winning external research funding 
    • track record of international scholarly activity 

     

    Initial appointment for assistant professor and for associate professor is for five years. Subject to successful performance, you will become a tenured member of the faculty staff at the end of the first five-year period. Full professors will hold a permanent appointment from the outset. A trial period of six months applies to all our new employees. Candidates may be invited for a video interview during the first stage of the recruitment process. The most qualified candidates will be invited to Tampere University for an interview and may undergo an aptitude assessment. They will also undergo a review by external experts and may be required to give a demonstration of their teaching skills.

     

    Further Information


    Further information on the position and the working environment may be obtained from Professor, Vice-Dean for Research Juho Hamari (research.itc@tuni.fi), and about the tenure track career path from HR Specialist Safija Chabbi (safija.chabbi@tuni.fi).

     

    About ITC Faculty: https://www.tuni.fi/en/about-us/faculty-information-technology-and-communication-sciences

    About Tampere University Tenure Track career path: https://www.tuni.fi/en/about-us/working-at-tampere-universities/tampere-university-as-an-employer/tenure-track-career-path

    Apply here: https://www.ellisinstitute.fi/PI-recruit

  • 2024-12-10 13:10 | Anonymous member (Administrator)

    “Privacy for Smart Speech Technology” (PSST) is a joint doctoral training programme and Horizon Europe Marie Skłodowska-Curie Action, the European Union’s flagship funding programme for doctoral training. We are a consortium of 7 European universities and 11 industrial partners searching for 12 PhD students to work on the protection and evaluation of privacy for smart speech technology. PSST is a unique opportunity, as it is the largest international project focusing on privacy in speech technology and because the importance of privacy has only recently gained wider appreciation.

    This is no ordinary PhD programme.

    The structured PSST doctoral training programme combines training in cutting-edge research, transferable skills and career-enhancing skills with exposure to multiple sectors and disciplines.

    Join us and put your expertise in deep learning / machine learning, speech processing, information privacy and security, and user studies into practice and gain your PhD degree from TWO leading European Universities (listed below)!

    See more information and PhD topics at https://psst-doctoralnetwork.eu/

    We are looking for 12 PhD candidates who hold a master's degree. We value diversity and plan to hire 12 fellows with a balanced background and skillset, and an excellent academic track record. We especially encourage applications from members of under-represented groups.

    Call opens 10.12.2024

    Application deadline 26.1.2025

    Shortlisted candidates informed 28.2.2025

    Recruitment event in Finland for shortlisted candidates 17.-18.3.2025

    Notification of acceptance May 2025

    Planned start of employment August 2025

    PSST follows a double-degree model whereby, during their 45-month employment, each PhD student will work in collaboration with two universities towards PhD degrees from both institutions! Each PhD student will also spend 6 months on secondment to one of our Associate Partners, all leading European SMEs, large industrials or regulatory bodies active in speech privacy: CNIL (France), ELDA (France), ki:elements (Germany), Loihde (Finland), Naver (France), Omilia (Greece), Orange (France), Vocapia (France), VoiceInteraction (Portugal), Voice INTER connect (Germany), and VoiceMod (Spain).

    Applications should include:

    • Curriculum Vitae (including countries of residence in the past 36 months).
    • Academic transcripts for completed courses and degrees.
    • Motivation letter explaining why you want to pursue a PhD degree and why you believe you are an outstanding candidate to pursue your PhD researching PSST topics.
    • Reference letter from Master’s thesis supervisor/advisor or similar.
    • (Optional) Preferences for 1-3 research topics (see webpage) and universities.

    Requirements

    • A master's degree in electrical engineering, computer science or related area (degree must be completed before employment can start).
    • Mobility: The fellow must not have resided or carried out their main activity (work, studies, etc.) in the country of the first recruiting organisation for more than 12 months in the 36 months immediately before their recruitment date.
    • Fluent written and verbal communication skills in English are required, knowledgof the local language is an advantage.
    • Candidates cannot hold a doctoral degree.
    Desirable skills
    • Knowledge and skills in deep learning, programming, speech processing, usestudies, privacy.
    • Ability to work independently and a critical mindset.
    • Pro-activeness and eagerness to participate in network-wide training events, international mobility, and public dissemination activities.

    Submit your application at

    https://www.aalto.fi/en/open-positions/doctoral-researchers-12-positions-privacyfor-

    smart-speech-technology-psst

    PhD students receive a regular salary and social benefits according to national regulations, and if applicable, also family leave, long-term leave, and special needs allowances.

    The gross salaries we offer, including both a living allowance and a mobility allowance, are

    • Aalto University (Espoo, Finland) 3500 €/month
    • EURECOM (Sophia Antipolis, France) 3261 €/month1
    • INESC-ID (Lisbon, Portugal) 2680 €/month2
    • INRIA (Nancy or Saclay, France) 3261 €/month 1
    • Ruhr University Bochum (Germany) Salary group TV-L E13 3
    • Radboud University Nijmegen (Netherlands) Salary scale P 4
    • Technical University of Berlin (Germany) Salary group TV-L E13 3

    1 https://www.horizon-europe.gouv.fr/sites/default/files/2022-02/horizon-europe---dn-pf---french-salary-explained-5762.pdf

    2 includes: base salary + food allowance + holiday allowance

    3 https://oeffentlicher-dienst.info/c/t/rechner/tv-l/allg?id=tv-l-2024&g=E_13&s=1

    4 https://www.ru.nl/sites/default/files/2024-09/Overview%20salary%20scales%201%20sept%202024.pdf

    For queries, contact info@psst-doctoralnetwork.eu.

  • 2024-11-07 19:27 | Anonymous
    *** Tenure-Track and Research Faculty Positions at the Toyota Technological Institute at Chicago ***


    * The Toyota Technological Institute at Chicago (TTIC) invites applications for the following faculty positions in computer science:

      - Tenure-track Assistant Professor
      - Tenured Associate Professor or full Professor
      - Research Assistant Professor (non-tenure track, endowed position for up to 3 years; see https://ttic.edu/research-assistant-professor/ )
      - Visiting Professor


    * While we welcome applications from many areas of computer science, we will give preference to candidates working in machine learning, computer vision, natural language processing and speech, robotics, computational biology, and algorithms and complexity theory.


    * About TTIC

    TTIC (www.ttic.edu) is an independent, philanthropically endowed academic institute dedicated to fundamental research and graduate education in computer science. All TTIC faculty positions are supported by the endowment.  TTIC has an accredited PhD program in computer science.

    TTIC produces cutting-edge research and offers world-class graduate education. Our faculty (https://www.ttic.edu/faculty/) are recognized with distinctions such as the Sloan Research Fellowships, NSF CAREER Awards, Best Paper Awards, and the NAS Michael and Sheila Held Prize. TTIC research faculty alumni have an excellent employment track record (https://www.ttic.edu/faculty-alumni/).  

    TTIC faculty members enjoy a uniquely light teaching load, which helps them focus on their research. TTIC has only PhD students, so all courses and activities are focused on advanced learning and research.  

    TTIC’s students have been recognized with fellowships (such as NSF, Google, and Microsoft), and have an excellent career track record, including post-docs and faculty positions at top universities and research positions at major industry labs (https://www.ttic.edu/student-alumni/).
     
    Located on the University of Chicago campus, TTIC has strong ties to the University. In addition to TTIC's excellent computing infrastructure, faculty members benefit from many of U. Chicago's state-of-the-art facilities.  TTIC faculty also regularly collaborate with U. Chicago faculty and students, as well as with faculty and students at Northwestern and other nearby institutions.

    TTIC strongly supports travel and visitor hosting, and typically hosts several workshops each year.  

    TTIC faculty and students enjoy the close proximity of a vibrant urban environment with flourishing culture, business, and entertainment scenes.


    * Teaching Requirements

    Tenured/tenure-track faculty teach one quarter per year. Research faculty have no teaching duties, but have the opportunity to teach and co-advise students.


    * TTIC/Simons-Berkeley Joint Program

    Applicants for research assistant professor (RAP) positions in relevant areas are encouraged to simultaneously apply for the TTIC RAP program and the Simons-Berkeley Research Fellowship (https://simons.berkeley.edu/research-fellowship-call-applications).

    Applicants selected by both institutions will be able to participate in a program at the Simons Institute before joining TTIC. Please note that applicants interested in the joint program must submit separate applications to TTIC and the Simons Institute.


    * Timeline

    Applications received before December 1 are guaranteed full consideration. However, applications will continue to be considered at any time.

    If interested in the joint program with the Simons Institute, please note that the Simons Institute has a different deadline.


    * Where to Apply:  https://ttic.edu/facultyapplication

    Senior applicants may directly contact the Chief Academic Officer (avrim@ttic.edu) or faculty members in their areas.


    * Questions?  Contact recruiting@ttic.edu



  • 2024-10-03 15:04 | Anonymous member (Administrator)

    Vicomtech (https://www.vicomtech.org/en/), an international applied research centre specialised in Artificial Intelligence, Visual Computing and Interaction located in Spain, has several research positions in the field of speech and natural language processing.

    We are seeking talented and motivated individuals to join our dynamic Speech and Natural Language Technologies team in either our Donostia - San Sebastián or Bilbao premises. If you have experience in speech and/or natural language processing technologies and are passionate about applying cutting-edge research to solve real-world needs through advanced prototypes, this opportunity is for you! 

    Whether you are a junior researcher (BSc/MSc graduate) looking to kickstart your career or a senior researcher (PhD graduate) eager to take on research leadership roles, we are interested in your profile. We offer the perfect environment with outstanding equipment and the best human team for growth. You will participate in advanced research and development projects, with opportunities to manage high-profile projects and/or lead technical teams depending on your experience. 

    Key Responsibilities: 

    • Conduct cutting-edge research in Speech and Natural Language Processing (NLP) technologies such as automatic speech recognition and synthesis, audio deep fake detection, information extraction, machine translation, text simplification and dialogue systems, among others. 
    • Contribute to national and international research projects.
    • Develop advanced prototypes that transfer technology to businesses and institutions. 
    • Manage or lead research projects, depending on experience. 

    Requirements: 

    • Bachelor’s or Master’s degree in Computer Science, Telecommunications Engineering or related fields. 
    • For senior profiles, a PhD in Speech Processing, NLP, AI or related disciplines is preferred. A PhD is not required for junior candidates. 
    • Strong programming skills (Python, Bash). 
    • Fluency in both spoken and written Spanish and English. 

    Preferred Skills (Not Required but Valued): 

    • Experience with speech and natural language processing tools and libraries (e.g. Kaldi, Whisper, Marian NMT, HuggingFace Transformers, Rasa, etc.). Deep learning frameworks (Pytorch, Tensorflow, ONNX). 
    • Virtualization technologies (Docker, Kubernetes). 
    • Experience in industrial and/or European research projects. 

    What We Offer: 

    • A vibrant, innovative research environment with state-of-the-art AI, Visual Computing, and Interaction technologies. 
    • Exciting national and international research projects. A multidisciplinary and renowned team in Speech and Language Technologies. 
    • Creative freedom in research, aligned with the centre’s goals. 
    • Opportunities for personal development through continuous learning. 
    • Clear career progression paths and leadership opportunities. 
    • Work-life balance policies and a commitment to equal employment opportunities. 

    If you are passionate about research and eager to apply or develop your expertise to real-world challenges, we encourage you to send us your CV and join our forward-thinking team!

    To apply via LinkedIn: https://www.linkedin.com/jobs/view/4034768411


  • 2024-06-24 11:35 | Anonymous member

    KU Leuven's Faculty of Engineering Science has an open position for a junior professor (tenure track) in the area ofSpoken Language Technologies. The successful candidate will conduct research on current challenges of speech technology and its applications,teach courses in the Master of Engineering Scienceand supervise students in the Master and PhD programs. The candidate will be embedded in the PSI research divisionof the Department of Electrical Engineering. More information is available athttps://www.kuleuven.be/personeel/jobsite/jobs/60334358?lang=en. The deadline for applications is September 30, 2024. 

    KU Leuven is committed to creating a diverse environment. It explicitly encourages candidates from groups that are currently underrepresented at the university to submit their applications. 

  • 2024-05-08 12:02 | Anonymous member (Administrator)

    Saarland University is a campus university with an international focus and a strong research profile. With numerous internationally respected research institutes on campus and dedicated support for collaborative projects, Saarland University is an ideal environment for innovation and technology transfer. The German Research Center for Artificial Intelligence (DFKI) is Germany's leading application-driven research institute with a core technology transfer mission. DFKI is currently the world's largest research centre for artificial intelligence operated as a public-private partnership. DFKI maintains close collaborative ties with national and international companies and is firmly rooted in the worldwide scientific AI landscape.

    To further strengthen this excellence in research and teaching, the Department of Language Science and Technology(LST) in collaboration with the German Research Center for Artificial Intelligence (DFKI) is inviting applications for the following position:

    Professorship (W3) in Language Technology

    (m/f/x; Reference: W2464)

    This position is a permanent public sector appointment (equivalent to a 'full-tenured professorship') starting at the earliest possible opportunity. We are looking for an experienced researcher in the field of language technology who has extensive knowledge of natural language processing and machine learning/AI methodologies. Experience with dialogue systems and reinforcement learning, the development of foundation models and/or trustworthy Artificial Intelligence is also desirable. In addition to holding a professorship at the university, the successful candidate will also be appointed as a scientific director at the German Research Center for Artificial Intelligence (DFKI) where they will head a research department. DFKI is an application-driven research organization that is largely financed through external project funding. A demonstrated ability to attract significant external funding for research projects at the national and international level is therefore essential. We also expect candidates to have experience in interdisciplinary research and in collaborating with industrial partners.The Department of Language Science and Technology is internationally recognized for its collaborative and interdisciplinary research, and the successful candidate will be expected to contribute to relevant jointr esearch initiatives. Language technologies are core elements of our study programmes at the M.Sc./M.A.and B.Sc./B.A. level and the person appointed will teach courses within these programmes.

    What we can offer you:

    The successful candidate will conduct world-class research, lead their own research group at the university and perform teaching and supervisory duties at the undergraduate, graduate and doctoral levels. At DFKI, the person appointed will lead a research department with access to an extensive worldwide network of industrial and other research partners, facilitating research and impact at a scale that is otherwise difficultto achieve. The position offers excellent working conditions in a lively and international scientific community. Saarland University is one of the leading centres for language science and computational linguistics in Europe and offers a dynamic and stimulating research environment. The Department of Language Science and Technology (LST) employs about 100 research staff across nine research groups in the fields of computational linguistics, natural language processing, psycholinguistics, phonetics and speech science, speech processing, and corpus linguistics (https://www.uni-saarland.de/en/department/lst.html). The department serves as the focal point of the Collaborative Research Centre 1102 'Information Density and LinguisticEncoding'(http://www.sfb1102.uni-saarland.de)andoftheResearchTrainingGroup'Neuroexplicit Models of Language, Vision, and Action' (https://www.neuroexplicit.org/), both of which involve close collaborationwithDFKI.TheLSTdepartmentandtheDFKIarebothpartoftheSaarlandInformaticsCampus (SIC: https://saarland-informatics-campus.de/en), which brings together some 800 researchers and over 2000studentsfrom81countries.SICisacollaborationbetweenSaarlandUniversityandworld-classresearch institutions on campus, which in addition to DFKI include the Max Planck Institute for Informatics and the Max Planck Institute for SoftwareSystems.

    Qualifications:

    The appointment will be made in accordance with the general provisions of German public sector employmentlaw.Candidatesmusthaveexperienceinandanaptitudeforacademicteaching.Theywillhave a PhD or doctorate in an appropriate subject and will have demonstrated a particular capacity for independent academic research, typically by having obtained an advanced, post-doctoral research degree ( Habilitation) or by having published an equivalent volume of peer-reviewed research or by having been appointed to a junior professorship or similar position. They will have a proven track record of leading their own research group and of acquiring external research funding. The successful candidate will be expected to actively contribute to departmental research and teaching. The language of instruction is English (in the M.Sc. and M.A. programmes) and German (in the B.Sc./B.A. programmes). We expect the successful candidate either to have sufficient proficiency to teach in both languages or to be willing to acquire this  level of proficiency within an appropriateperiod.

    Your Application:

    Applications should be submitted online at www.uni-saarland.de/berufungen. No additional paper copy is required. The application must contain:

    • a letter of application and CV/résumé (including your telephone number andemail address)
    • a complete list of your academicpublications
    • a complete list of external funding (stating own share if you were not the solebeneficiary)
    • your proposed research concept (2–5pages)
    • your teaching concept (1page)
    • copies of your degreecertificates
    • complete copies of your five most significantpublications
    • the names of three academic references (including email addresses),at least one of whom is not one of your previous academic supervisors.
    • If you hold a university degree from a foreign university, please provide proof of equivalence from Germany's Central Office for Foreign Education (ZAB) if available. If proof of equivalence has not been requested at the time of application, it must be submitted later upon request.

    Applications must be received no later than May 30, 2024.

    Please include the job reference number W2464 when you apply. Selected candidates will be interviewed. If you have any questions, please contact: crocker@lst.uni-saarland.de.

    At Saarland University, we view internationalization as a process spanning all aspects of university life. We therefore expect members of our professorial staff to engage in activities that promote and foster further internationalization. Special support will be provided for projects that maintain collaborative interactions within existing international cooperative networks, e.g. projects with partners in the European University Alliance Transform4Europe (www.transform4europe.eu) or the University of the Greater Region (www.uni- gr.eu)

    Saarland University is an equal opportunity employer. In accordance with its affirmative action policy, Saarland University is actively seeking to increase the proportion of women in this field. Qualified women candidates are therefore strongly encouraged to apply. Preferential consideration will be given to applications from disabled candidates of equal eligibility. We welcome applications regardless of nationality, ethnic and social origin, religion/belief, age, sexual orientation and identity.

    WhenyousubmitajobapplicationtoSaarlandUniversityyouwillbetransmittingpersonaldata.Pleaserefer to our privacy notice (https://www.uni-saarland.de/verwaltung/datenschutz/) for information on howwe collect and process personal data in accordance with Art. 13 of the General Data Protection Regulation (GDPR). By submitting your application, you confirm that you have taken note of the information in the Saarland University privacynotice.

    The full job advertisement can be found at:

    www.uni-saarland.de | www.youtube.com/watch?v=tzo6dxr1FWk


  • 2024-02-19 16:44 | Anonymous member

    The Laboratory of Language Technology (https://taltech.ee/en/laboratory-language-technology) at Tallinn University of Technology, Estonia, is looking to fill a postdoc position in the field of speech processing and/or NLP. The position is funded by EXAI -- the Estonian Centre of Excellence in Artificial Intelligence (2024−2030).

    The position is flexible with respect to topic, but it should connect thematically with current topics of interest to the research group (speech recognition, speaker and language recognition, speaker diarization, spoken language translation, summarization, low resource scenarios). Some possible research directions are using and finetuning of different speech and language foundation models (such as wav2vec2.0, Whisper, LLMs) for various speech and language processing tasks.

    The position does not include any teaching load, but supervision of Master and PhD students is expected.

    We are looking for candidates who have finished, or are about to complete, a PhD degree in speech processing, NLP or a related discipline. You must be proficient in English (spoken and written). Applicants should have demonstrated their research expertise through high-quality publications.

    The starting salary for this position is around 3500 euros per month (before taxes, around 2700 euros after taxes) and increases with experience. Additional benefits include roughly 6 weeks of paid annual leave, paid sick leave as well as maternity and parental leave. The initial appointment will be for two years; the position could be extended and migrated to a permanent researcher position, if suitable for both parties. The starting date is March 2024 or later; we would be willing to adapt to the time requirements of an ideal candidate.

    How to apply:

    Please send an e-mail to Tanel Alumäe (tanel.alumae@taltech.ee with the following information:

    * a short statement (just a few sentences) of research interests that motivates why you are applying for this position;
    * a full CV including your list of publications;

    Or, just apply via Linkedin: https://www.linkedin.com/hiring/jobs/3827285976/detail

    Unofficial inquiries about the position are also welcome!

  • 2024-01-04 15:22 | Anonymous

    Nous proposons un stage de recherche (Bac+5) au service recherche de l'Institut National de l'Audiovisuel (INA). Le stage porte sur la détection de l'activité vocale dans des corpus audiovisuels à l'aide de représentations auto-supervisées.

    Vous trouverez ci-joint l'offre de stage détaillée.

    D'autres stages sont également proposés au sein de l'INA, l'ensemble des sujets peuvent être retrouvés sur la page suivante : https://www.ina.fr/institut-national-audiovisuel/equipe-recherche/stages.

     

    Détection de l'activité vocale dans des corpus audiovisuels à l'aide de représentations auto-supervisées Stage de fin d’études d’Ingénieur ou de Master 2 – Année académique 2023-2024 

     

    Mots clés : deep learning, machine learning, self supervised models, voice activity detection, speech activity detection, wav2vec 2.0 Contexte L’Institut National de l’Audiovisuel (INA) est un établissement public à caractère industriel et commercial (EPIC), dont la mission principale consiste à sauvegarder et promouvoir le patrimoine audiovisuel français à travers la vente d’archives et la gestion du dépôt légal. À ce titre, l’Institut capte en continu 180 chaînes de télévision et radio et stocke plus de 25 millions d’heures de contenu audiovisuel. L’INA assure également des missions de formation, de production et de recherche scientifique. Le service de la recherche de l’INA mène depuis plus de 20 ans des travaux de recherche dans le domaine de l’indexation et de la description automatique de ces fonds selon l’ensemble des modalités : textes, sons et images. Le service participe à de nombreux projets collaboratifs de recherche que ce soit dans un cadre national et européen et accueille des stages de Master ainsi que des doctorants en co-encadrement avec des laboratoires nationaux d’excellence. Ce stage est proposé au sein de l’équipe de recherche (https://recherche.ina.fr) et se place dans le cadre d’un projet collaboratif financé par l’ANR : Gender Equality Monitor (GEM). D’autres sujets de stage sont également proposés dans l’équipe : https://www.ina.fr/institut-national-audiovisuel/equipe-recherche/stages

    Objectifs du stage La détection d’activité vocale (Voice Activity Detection - VAD) est une tâche d’analyse audio qui vise à identifier les portions d’enregistrement contenant de la parole humaine, les distinguant des autres parties du signal contenant du silence, des bruits de fond ou de la musique. Souvent considérée comme un prétraitement, cette méthode utilisée en amont des tâches de reconnaissance automatique de la parole, des locuteurs ou des émotions. Si les outils VAD existants permettent d’obtenir d’excellents résultats sur les programmes d’information ou les émissions de plateau [Dou18a, Bre23], les recherches récentes menées à l’INA ont révélé que les performances des systèmes état-de-l’art sont moindres pour un grand nombre de matériaux peu représentés dans les corpus de parole annotés. Ces contenus, qui ont fait l’objet d’une campagne d’annotation interne, incluent des émissions musicales, des dessins animés, du sport, des fictions, des jeux télévisés et des documentaires. L'objectif du stage est de développer des modèles de détection d'activité vocale (VAD) en adoptant une approche fondée sur le paradigme d'apprentissage auto-supervisé et s’appuyant sur les architectures transformerstelles que wav2vec 2.0 [Bae20]. Les modèles basés sur ces architectures permettent d’obtenir des résultats état de l'art sur de nombreuses tâches de traitement de la parole à l’aide de quantités d’exemples annotés limitées : transcription, compréhension, traduction, détection d'émotions, reconnaissance de locuteur, détection du langage, etc [Li22, Huh23, Par23]. Plusieurs études récentes ont démontré l’efficacité des approches auto-supervisées pour la VAD [Gim21, Kun23], mais ont à ce jour été entraînées et évaluées sur des données ne reflétant pas la diversité des contenus audiovisuels. Le stage proposé vise à exploiter les millions d'heures de contenu audiovisuel conservés à l’INA pour l'entraînement et l’amélioration des modèles. Les modèles réalisés seront intégrés au logiciel open-source inaSpeechSegmenter, utilisé entre autres pour le décompte du temps de parole des femmes et des hommes dans les programmes à des fins de recherche ou de régulation du paysage audiovisuel [Dou18b, Arc23].

    Valorisation du stage Différentes stratégies de valorisation des travaux seront envisagées, en fonction de leur degré de maturité et des orientations envisagées pour la suite des travaux :

    ● Diffusion des modèles réalisés sous licence open-source sur HuggingFace et/ou le dépôt Github de l’INA : https://github.com/ina-foss

    ● Rédaction de publications scientifiques

    Conditions du stage Le stage se déroulera sur une période de 4 à 6 mois, au sein du service de la Recherche de l’Ina. Il aura lieu sur le site Bry 2, situé au 28 Avenue des frères Lumière, 94360 Bry-sur-Marne.La·le stagiaire sera encadré·e par Valentin Pelloin et David Doukhan. Un ordinateur équipé d’un GPU sera fourni ainsi qu’un accès au cluster de calcul de l’Institut. Gratification : 760 € brut / mois + 50 % pass navigo

    Télétravail : possible une journée par semaine

    Contact Pour soumettre votre candidature à ce stage, ou pour solliciter davantage d’informations, nous vous invitons à envoyer votre CV et votre lettre de motivation par e-mail aux adresses suivantes : vpelloin@ina.fr et ddoukhan@ina.fr. Profil recherché ● Étudiant·e en dernière année d’un bac +5 dans le domaine de l’informatique et de l'IA

    ● Forte appétence pour la recherche académique

    ● Intérêt pour le traitement automatique de la parole

    ● Maîtrise de Python et expérience dans l’utilisation de bibliothèques de ML

    ● Capacité à effectuer des recherches bibliographiques ● Rigueur, Synthèse, Autonomie, Capacité à travailler en équipe

    Bibliographie

    [Arc23] ARCOM (2023). “La représentation des femmes à la télévision et à la radio - Rapport sur l'exercice 2022” [en ligne].

    [Bae20] A. Baevski, H. Zhou, A. Mohamed, and M. Auli, “wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations,” Neural Information Processing Systems, Jun. 2020.

    [Bre23] Bredin, H. (2023). pyannote.audio 2.1 speaker diarization pipeline: principle, benchmark, and recipe, in INTERSPEECH 2023, ISCA, pp. 1983–1987.

    [Dou18a] Doukhan, D., Carrive, J., Vallet, F., Larcher, A., & Meignier, S. (2018, April). An open-source speaker gender detection framework for monitoring gender equality. In 2018 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 5214-5218). IEEE.

    [Dou18b] Doukhan, D., Poels, G., Rezgui, Z., & Carrive, J. (2018). Describing gender equality in french audiovisual streams with a deep learning approach. VIEW Journal of European Television History and Culture, 7(14), 103-122.

    [Gim21] P. Gimeno, A. Ortega, A. Miguel, and E. Lleida, “Unsupervised Representation Learning for Speech Activity Detection in the Fearless Steps Challenge 2021,” in Interspeech 2021, ISCA, Aug. 2021, pp. 4359–4363.

    [Huh23] Huh, J., Brown, A., Jung, J. W., Chung, J. S., Nagrani, A., Garcia-Romero, D., & Zisserman, A. (2023). Voxsrc 2022: The fourth voxceleb speaker recognition challenge. arXiv preprint arXiv:2302.10248.

    [Kun23] M. Kunešová and Z. Zajíc, “Multitask Detection of Speaker Changes, Overlapping Speech and Voice Activity Using wav2vec 2.0,” in ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Jun. 2023, pp. 1–5.

    [Li22] Li, M., Xia, Y., & Lin, F. (2022, December). Incorporating VAD into ASR System by Multi-task Learning. In 2022 13th International Symposium on Chinese Spoken Language Processing (ISCSLP) (pp. 160-164). IEEE.

    [Par23] Parcollet, T., Nguyen, H., Evain, S., Boito, M. Z., Pupier, A., Mdhaffar, S., ... & Besacier, L. (2023). LeBenchmark 2.0: a Standardized, Replicable and Enhanced Framework for Self-supervised Representations of French Speech. arXiv preprint arXiv:2309.05472.

  • 2024-01-04 15:21 | Anonymous

    L’équipe SAMoVA de l’IRIT à Toulouse propose plusieurs stages (M1, M2, PFE ingénieur) en 2024 autour des thématiques suivantes (liste non exhaustive) :

     

    - Génération Automatique De Partitions Musicales Dans Le Style Choro

    - Compréhension De La Parole Et IA Au Service De L’Analyse Sensorielle

    - Caractérisation Du Comportement Alimentaire Par Des Analyses Vidéo Et Multimodale

    - Adaptations De Systèmes De Reconnaissance Automatique De Parole En Contexte Pathologique

    - Traitement De Signal Et IA Pour Révéler Des Troubles Articulatoires En Production De Parole Atypique

    - End-To-End Speech Recognition For Assessing Comprehension Skills Of Children Learning To Read

    - Active Learning For Speaker Diarization

    - Modélisation Automatique Du Rythme De La Parole

    - Transcription de Verbalisations pour l’Analyse du Discours lors de Scénarios en Réalité Virtuelle

    - Mise en œuvre d’un prototype de reconnaissance vocale comparative appliqué à l’apprentissage du langage oral

     

    Tous les détails (sujets, contacts) sont disponibles dans la section 'Jobs' de l’équipe :
    https://www.irit.fr/SAMOVA/site/jobs/
<< First  < Prev   1   2   3   Next >  Last >> 
 Organisation  Events   Membership   Help 
 > Board  > Interspeech  > Join - renew  > Sitemap
 > Legal documents  > Workshops  > Membership directory  > Contact
 > Logos      > FAQ
       > Privacy policy

© Copyright 2024 - ISCA International Speech Communication Association - All right reserved.

Powered by Wild Apricot Membership Software