Tuesday 19 June 2012

Why I don't want to teach visemes...


1. Hearing learners of English do NOT learn English, by listening to English phonemes and/or by learning to recognize phonemes in CVC or VCV words (C = Consonant, V = Vowel). Can you imagine? A beginner's English (or French, German, etc.) Course, where you begin by learning to recognize the 'a' sound, then practice discriminating 'a' and 'e'. Boring! And probably: useless. So why does anyone think this is how we should teach lipreading?

2. Lipreaders can learn to discriminate between visemes, and they can learn to identify some visemes (not phonemes!) in a standardised context: this particular speaker, this particular pattern, and maybe even: this particular video-recording. However, this doesn't help them one bit, when they try to lipread connected text, or even words. Visemes 'change shape' in the context of other visemes! The 'a' that you've learned to discriminate from the 'o', suddenly looks very different in another word or phrase. Because speech is dynamic, and because each viseme is influenced by the visemes before and after. In the same way that phonemes are influenced by the phonemic context. This is called co-articulation, see wikipedia http://en.wikipedia.org/wiki/Coarticulation

3. Visemes may change shape and/or be more or less visible, depending on their location in a word or sentence. Stressed words are easier to recognize (they usually last longer!), than unstressed words. Stressed vowels are more easy to recognize, then unstressed vowels. Visemes in unstressed pre- or suffixes may be totally invisible. 

4. Visemes are not standardised. There are major differences between speakers, and within speakers.

5. Mouthpatterns - and therefore visemes - change depending on the speaking style of the speaker, the speaking rate, emotions, etc. etc. The 'a' that you learned to discriminate, suddenly looks very different when the speaker is smiling! Or in a hurry. Or tired. Or angry. Or shouting, because you still don't understand ;-( 

6. Many visemes are ambiguous, many phonemes are invisible.The lipreader needs to use context to predict what a certain mouthpattern can be. Context: the situation, the speaker, the language, the topic, the sentence, the word. This is what we have to teach lipreaders from the very start: how they can use context to disambiguate the unfortunately very ambiguous visual speech signal. 







No comments:

Post a Comment