Acoustic-Visual Speech Synthesis by Bimodal Unit Concatenation: Toward a true acoustic-visual speech synthesis
This project is funded by The French National Research Agency (L'Agence nationale de la recherche ANR) within the programme ANR Jeunes Chercheuses et Jeunes Chercheurs.
This project is funded for a period of 4 years, started on 01/2009 until 12/2012.


The goal of this project is to investigate a new approach of a text-to-acoustic-visual speech synthesis which is able to animate a 3D talking head and to provide the associated acoustic speech.

In our project, we consider both channels acoustic and visual simultaneously, as one true bimodal signal. As for Text-To-(acoustic)-Speech synthesis by non-uniform unit selection, the selection and the concatenation of units, coming from an acoustic corpus, could be extended to units coming from a bimodal corpus. This approach will contribute to two research fields: In acoustic-only synthesis by augmenting acoustic features by visual information (mainly the human face) for the selection step and in the acoustic-visual synthesis by keeping the strict relation between the acoustic and the visual component of the bimodal signal during the whole process of the synthesis.


Speech Group

Magrit Group

Open Position
Visac Banner
eXTReMe Tracker