Lipreading 2.0, workblog: June 2012

Tuesday, 19 June 2012

Why I don't want to teach visemes...

1. Hearing learners of English do NOT learn English, by listening to English phonemes and/or by learning to recognize phonemes in CVC or VCV words (C = Consonant, V = Vowel). Can you imagine? A beginner's English (or French, German, etc.) Course, where you begin by learning to recognize the 'a' sound, then practice discriminating 'a' and 'e'. Boring! And probably: useless. So why does anyone think this is how we should teach lipreading?

2. Lipreaders can learn to discriminate between visemes, and they can learn to identify some visemes (not phonemes!) in a standardised context: this particular speaker, this particular pattern, and maybe even: this particular video-recording. However, this doesn't help them one bit, when they try to lipread connected text, or even words. Visemes 'change shape' in the context of other visemes! The 'a' that you've learned to discriminate from the 'o', suddenly looks very different in another word or phrase. Because speech is dynamic, and because each viseme is influenced by the visemes before and after. In the same way that phonemes are influenced by the phonemic context. This is called co-articulation, see wikipedia http://en.wikipedia.org/wiki/Coarticulation.

3. Visemes may change shape and/or be more or less visible, depending on their location in a word or sentence. Stressed words are easier to recognize (they usually last longer!), than unstressed words. Stressed vowels are more easy to recognize, then unstressed vowels. Visemes in unstressed pre- or suffixes may be totally invisible.

4. Visemes are not standardised. There are major differences between speakers, and within speakers.

5. Mouthpatterns - and therefore visemes - change depending on the speaking style of the speaker, the speaking rate, emotions, etc. etc. The 'a' that you learned to discriminate, suddenly looks very different when the speaker is smiling! Or in a hurry. Or tired. Or angry. Or shouting, because you still don't understand ;-(

6. Many visemes are ambiguous, many phonemes are invisible.The lipreader needs to use context to predict what a certain mouthpattern can be. Context: the situation, the speaker, the language, the topic, the sentence, the word. This is what we have to teach lipreaders from the very start: how they can use context to disambiguate the unfortunately very ambiguous visual speech signal.

Monday, 18 June 2012

Don't shout, whisper?

Most speech-therapists and lipreading tutors will use voiceless speech (silent speech) in their lipreading training.

I've asked them why they do this. The answers so far:

1. In lipreading groups, some participants may have some hearing. If the tutor uses normal speech, the participants with some hearing have an advantage over the deaf participants. Using silent speech makes for a more equal playing field.
2. Many therapists / tutors include normally hearing partners in their groups. By using silent speech, they normally hearing partners experience what it is like, to be deaf.

But many (all?) speech therapists / tutors who train lipreading on a 1-1 basis, also use silent speech. Why?

I don't know.

Maybe because many clients (still) have some hearing, and exercises would be too easy if they could hear & see what the therapist / tutor says?

On the other hand, it makes the activities more difficult, more frustrating, and also: less realistic. In real life, people WILL use their residual hearing!

From my perspective, therapists should teach lipreading learners how they can combine the two channels (ears, eyes). Some things can be heard, some things can be seen.
If exercises are too easy for a particular learner, when the therapist / tutor uses his/her voice, then the therapist can add background noise or white noise. Or s/he can ask the learner to turn down the volume of the hearing aids or CIs. But I would not recommend to use voiceless speech!

A related question:

How do voiceless and whispered speech compare to normal speech, visually? Are the mouthpatterns the same? No, because in voiceless speech, the vocal cords don't vibrate. Are there other differences?
When whispering, does the speaker move lips, tongue and jaw more actively, to compensate for the lack of volume?
I've tried to find research on the 'lipreadability' of normal, whispered and voiceless speech. So far, no luck. I've also looked for the visual characteristics of normal, whispered and voiceless speech. Still searching. It does seem that whispered speech is spoken more slowly, and that consonants last a bit longer compared to normal speech.

All books, websites, flyers etc. about communication with hard-of-hearing and deaf people repeat the advice that the speaker shouldn't shout. Shouting changes mouthpatterns and makes lipreading more difficult.

Maybe whispering makes lipreading easier? Maybe we should have buttons, t-shirts and banners that say:

"I'm hard-of-hearing / deaf. Don't shout, please whisper!"

True? Not true?

Anyone who has data on this, or an opinion: please share it with me and the other readers of this blog!

PS: I tried to find a picture on the internet to show 'whispering'. In all pictures, people whisper directly into someone's ear. Sometimes, they cover their mouth with their hand to reduce visibility. That is NOT what I mean when I use the word 'whispering'! For me, whispering includes full visibility of the face of the speaker!