Lipreading 2.0, workblog: 2012

Wednesday, 1 August 2012

Lipreading Olympics

Watching the gymnasts doing amazing things on bars and beams during the Olympic Games, I mainly wonder why people want to do these things.

But I also listen to the commentators, who tell me what is good or bad about certain performances. Those commentators see things, that I don’t. They recognize specific elements in a performance, and can somehow see that a knee was bent, a hand was misplaced, while I just see one stream of movements.

Same as with expert and beginner lipreaders? Experts see things, that beginners don’t?

On TV, they help me by re-playing exercises in slow motion and telling me what I’m supposed to see. Ah, now I see it! Yes, that knee was bent, that hand was misplaced!

Does that mean that I will see it myself, the next time? Probably not at normal speed, but once I know what to look out for, I expect I will be able to see it in slow motion, the next time. And after a lot of practice, I will get as good as the judges or the commentators! That’s how they learned, too, didn't they? Hundreds (thousands?) of hours of watching, and being told what to look out for?

I wonder if it helps judges and commentators to see the small differences between one performance and the next, if they can ‘do’ (or could do) these exercises, themselves. Does it make it easier to judge these performances, to see the small differences, if you can do (or could do, or have tried to do) these jumps and flicks and twists, yourself?

Beginner lipreaders see just one stream of mouth-movements. Experts see words and sentences. So for the beginners, we play videos of speakers in slow-motion, and tell them what to look out for. Again and again. First in slow motion, then at normal speed.

And maybe we ask beginners to pronounce the words and phrases, themselves. So that they know what to look out for, when someone else uses these words or phrases. That’s why most lipreading teachers tell lipreaders to practice in front of a mirror.

Instead of a mirror, we can use a webcam. We can ask the lipreader to record him/herself and to play the video in slow-motion, to learn to see and feel the sometimes tiny differences between one pattern, one word or phrase, and the next.

Will that help? Will that make expert lipreaders of us all?

Watch this space, we'll let you know!

Tuesday, 19 June 2012

Why I don't want to teach visemes...

1. Hearing learners of English do NOT learn English, by listening to English phonemes and/or by learning to recognize phonemes in CVC or VCV words (C = Consonant, V = Vowel). Can you imagine? A beginner's English (or French, German, etc.) Course, where you begin by learning to recognize the 'a' sound, then practice discriminating 'a' and 'e'. Boring! And probably: useless. So why does anyone think this is how we should teach lipreading?

2. Lipreaders can learn to discriminate between visemes, and they can learn to identify some visemes (not phonemes!) in a standardised context: this particular speaker, this particular pattern, and maybe even: this particular video-recording. However, this doesn't help them one bit, when they try to lipread connected text, or even words. Visemes 'change shape' in the context of other visemes! The 'a' that you've learned to discriminate from the 'o', suddenly looks very different in another word or phrase. Because speech is dynamic, and because each viseme is influenced by the visemes before and after. In the same way that phonemes are influenced by the phonemic context. This is called co-articulation, see wikipedia http://en.wikipedia.org/wiki/Coarticulation.

3. Visemes may change shape and/or be more or less visible, depending on their location in a word or sentence. Stressed words are easier to recognize (they usually last longer!), than unstressed words. Stressed vowels are more easy to recognize, then unstressed vowels. Visemes in unstressed pre- or suffixes may be totally invisible.

4. Visemes are not standardised. There are major differences between speakers, and within speakers.

5. Mouthpatterns - and therefore visemes - change depending on the speaking style of the speaker, the speaking rate, emotions, etc. etc. The 'a' that you learned to discriminate, suddenly looks very different when the speaker is smiling! Or in a hurry. Or tired. Or angry. Or shouting, because you still don't understand ;-(

6. Many visemes are ambiguous, many phonemes are invisible.The lipreader needs to use context to predict what a certain mouthpattern can be. Context: the situation, the speaker, the language, the topic, the sentence, the word. This is what we have to teach lipreaders from the very start: how they can use context to disambiguate the unfortunately very ambiguous visual speech signal.

Monday, 18 June 2012

Don't shout, whisper?

Most speech-therapists and lipreading tutors will use voiceless speech (silent speech) in their lipreading training.

I've asked them why they do this. The answers so far:

1. In lipreading groups, some participants may have some hearing. If the tutor uses normal speech, the participants with some hearing have an advantage over the deaf participants. Using silent speech makes for a more equal playing field.
2. Many therapists / tutors include normally hearing partners in their groups. By using silent speech, they normally hearing partners experience what it is like, to be deaf.

But many (all?) speech therapists / tutors who train lipreading on a 1-1 basis, also use silent speech. Why?

I don't know.

Maybe because many clients (still) have some hearing, and exercises would be too easy if they could hear & see what the therapist / tutor says?

On the other hand, it makes the activities more difficult, more frustrating, and also: less realistic. In real life, people WILL use their residual hearing!

From my perspective, therapists should teach lipreading learners how they can combine the two channels (ears, eyes). Some things can be heard, some things can be seen.
If exercises are too easy for a particular learner, when the therapist / tutor uses his/her voice, then the therapist can add background noise or white noise. Or s/he can ask the learner to turn down the volume of the hearing aids or CIs. But I would not recommend to use voiceless speech!

A related question:

How do voiceless and whispered speech compare to normal speech, visually? Are the mouthpatterns the same? No, because in voiceless speech, the vocal cords don't vibrate. Are there other differences?
When whispering, does the speaker move lips, tongue and jaw more actively, to compensate for the lack of volume?
I've tried to find research on the 'lipreadability' of normal, whispered and voiceless speech. So far, no luck. I've also looked for the visual characteristics of normal, whispered and voiceless speech. Still searching. It does seem that whispered speech is spoken more slowly, and that consonants last a bit longer compared to normal speech.

All books, websites, flyers etc. about communication with hard-of-hearing and deaf people repeat the advice that the speaker shouldn't shout. Shouting changes mouthpatterns and makes lipreading more difficult.

Maybe whispering makes lipreading easier? Maybe we should have buttons, t-shirts and banners that say:

"I'm hard-of-hearing / deaf. Don't shout, please whisper!"

True? Not true?

Anyone who has data on this, or an opinion: please share it with me and the other readers of this blog!

PS: I tried to find a picture on the internet to show 'whispering'. In all pictures, people whisper directly into someone's ear. Sometimes, they cover their mouth with their hand to reduce visibility. That is NOT what I mean when I use the word 'whispering'! For me, whispering includes full visibility of the face of the speaker!

Wednesday, 23 May 2012

Visemes or Morphemes?

Most lipreading tutors and books will start by teaching visemes. Visemes are for lipreaders, what phonemes are for listeners: the smallest 'standardized' building blocks of words. Phonemes are the basic 'sounds' of a language, visemes are the basic 'mouth-patterns'. I use 'mouth-patterns' instead of 'lip-patterns', because to recognize a viseme, you have to watch lips, tongue, cheeks, and to see voicing: the throat of the speaker.
OK, so I should call them 'speech-patterns'... to include the throat? For the time being: mouth-patterns.

For LipRead, I tried to make comprehensive lists of the phonemes and visemes of the project languages: English, German, Dutch, Norwegian and Turkish. For phonemes, that was relatively easy, because I could use Jacques Koreman's L1L2 Map: http://calst.hf.ntnu.no/.
Of course, the L1L2 map doesn't show 'allophones', the different variations of each phoneme, but it's a start.

For visemes, it was much more difficult. Pretty much everyone seems to make his/her own set of visemes for a language. Some based on research, some on articulatory features, some on ... I don't know. There is no 'standardized' set of visemes, within or across languages.

Charlotta Engström looked at Swedish visemes for her Master Degree Project in Speech Communication. One of her conclusions:

"Evidently the classification of phonemes into visemes can be done
differently with respect to factors like language, speaker, listener, speech
situation, lighting conditions and, as will be shown below, phonemic
context. The better the viewing conditions are, the more contrastive
categories can be discerned. In other words, visemes are not constant units."

see: http://www.speech.kth.se/prod/publications/files/1644.pdf

The main thing everyone seems to agree on, is that there are far fewer visemes than phonemes. Many phonemes are invisible, many others are ambiguous: several phonemes share the same mouthpattern. The usual example is 'b-m-p'. Three phonemes - one viseme.

BUT!
Even if we could agree on the number of visemes, teaching single visemes is only a very first step. Visemes change shape, in the context of other letters! A viseme that is perfectly visible when pronounced on its own, may be completely invisible when it's in the company of other letters / phonemes. Or it will look very different. Visible speech is not a sequence of still pictures, it's dynamic. The articulators have to go from one location to the next, and often what you see is not the end-location, but the movement!

Two examples - although I'm not very good at finding examples for English, so please correct me if I'm wrong or if you have better examples:

'food' versus 'foot': indistinguishable because both words share the same visemes. But if you say the words side by side, people will see the difference. Because of the end 'd' in food, the 'oo' in 'food' is much longer in duration, than in 'foot'. So we add another viseme: long 'oo' versus short 'oo'?
'bend' - 'lend' - 'blend': in many speakers, the 'l' is quite visible. But when it's preceded with a pretty dominant 'b', it may completely disappear from sight!

This doesn't only happen within words, it happens in all connected speech: phrases, sentences, etc.

Step 2 in teaching lipreading, would have to be to teach 'blends', or 'co-articulation'. Of course most tutors will do that, as soon as they move on from simple CVC (consonant - vowel - consonant) words. Then they will have to explain that yes, sorry, there are zillions of exceptions to the visemes that you've just learned.

OR
We teach morphemes, instead of visemes? Morphemes are the next 'basic part' in words. Not single letters / phonemes, but short words, or syllables. The good thing about morphemes is that they are meaningful. The bad news is that there are many more morphemes in a language, than phonemes.
As for the examples above: each of the words is a separate morpheme: food, foot, bend, lend, blend.

What I don't know, but I'm sure that it's been researched: there is more co-articulation within, than between morphemes?

Morphemes show up in research about teaching children to read. E.g. Improving Literacy by Teaching Morphemes, by T. Nunes and P. Bryant, 2006.

The moral of the story: I really really think we can Improve Lipreading by Teaching Morphemes!

Wikipedia on Morphemes: http://en.wikipedia.org/wiki/Morpheme

Wednesday, 9 May 2012

Phonics vs. whole language

If lipreading is - a bit - like reading, maybe we can learn something from all the research that has been done to find the best way to teach young children to read print.

In a nutshell, there are two approaches to teaching reading: phonics and 'whole language'.

This is what Wikipedia says about them:

"Phonics" emphasizes the alphabetic principle – the idea that letters represent the sounds of speech, and that there are systematic and predictable relationships between written letters and spoken words, which is specific to the alphabetic writing system. Children learn letter sounds (b = the first sound in "bat" and "ball") first and then blend them (bl = the first two sounds in "blue") to form words. Children also learn how to segment and chunk letter sounds together in order to blend them to form words (trap = /t/, /r/, /a/, /p/ or /tr/, /ap/).

"Whole language" is a method of teaching reading that emphasizes literature and text comprehension. Students are taught to use critical thinking strategies and to use context to "guess" words that they do not recognize. In the younger grades, children use invented spelling to write their own stories.

Comparable to the analytic approach of teaching lipreading, and the synthetic approach. Right?

[Oh.. I wasn't sure if it was 'analytic' approach, or 'analytical' approach so I asked Google. And stumbled on an article that I hadn't seen before: "Teaching lip-reading: The efficacy of lessons on video", by Barbara Dodd¹, Geoff Plant² and Mark Gregory¹. In the British Journal of Audiology 1989, Vol. 23, No. 3 , Pages 229-238. I will download it, read it, and tell you about it, later.
I do know now that it is the analytic approach vs. the synthetic approach.]

The confusing bit is that in reading, they also have 'analytical phonics' and 'synthetic phonics', and these are just the opposite of what you'd expect:

analytical phonics, also known as the Whole Word approach, is an approach to the teaching of reading in which the phonemes associated with particular graphemes are not pronounced in isolation.
synthetic phonics involves the development of phonemic awareness from the outset. As part of the decoding process, the reader learns up to 44 phonemes (the smallest units of sound) and their related graphemes (the written symbols for the phoneme).

In Dutch, we call this 'hakken en plakken', which Google translates into 'cut and paste'... which isn't too bad. Cutting: = cutting a word into smaller bits. Pasting = pasting smaller bits together to make a word. Analytical = cutting. Synthetic = pasting.

For now, the main thing is that there are two approaches to teaching reading: from letters (graphemes) to words to sentences to stories. And: from stories to sentences to words.

Also called bottom-up versus top-down processing:

bottom-up: from the letters on the page, up through your eyes, to the visual, then language, then cognitive processing centers in your brain, and finally: to awareness.
top down: from your expectations of the text (awareness) through your eyes, to the words on the page: does it say what I expect it to say?

Although the reading experts go back and forth about the best way to teach reading, the solution is obvious. In fluent reading, we do both, it's an interactive process. We predict, then we check if our predictions are correct. If they are not, we look more closely to see what the text really says, and feed our brain the new information. So in teaching reading, we have to teach both.

In teaching lipreading, we see the same two approaches. Some tutors / therapists start by teaching visemes, then short words, then words in sentences: the analytic approach.

Others start by teaching learners to use their world knowledge and their knowledge of the language, to predict what a speaker may say: the synthetic approach.

And most lipreading tutors and therapists of course teach both!

Many older books and websites, however, seem to favor the analytic approach: they start by teaching visemes. But I think that is mainly because all they had, were photographs of lip-patterns. Static lip-patterns, of the kind that you rarely see in real speech.

Because we don't speak in single phonemes / visemes, we speak in sentences. In words and sentences, phonemes / visemes change their sound / shape, depending on their position in a word, and the letters /phonemes before and after!

But if all you have are photographs, all you can do is show single visemes. Those photographs may make lipreaders aware of the things people do with their mouths when they speak. But they don't teach lipreading.

Speech is dynamic, it is a temporal pattern. You cannot cut speech up into single phonemes, well, you can, but then you'll have a hard time understanding what is being said. In the same way, you can't cut up speech into single visemes, because you lose the dynamics and the temporal pattern: the changes over time. If you do cut up speech into visemes, you'll have a very hard time understanding what is being said!

We read, and lipread, interactively - by predicting, then recognizing whole words and even phrases, and checking our expectations against what we see.

Thanks to video, we can now use 'moving pictures' to show 'real-life', living and moving visemes in their natural context: connected speech!

Two other things that we can learn from reading:

When we read, we convert the words on the page into sound. First we read out loud; later, we hear the words that we read, with our mind's 'inner ear'. This is called 'phonological coding', we convert visual patterns (written words) into sound patterns (spoken and/or heard words). In the same way, many lipreaders recode what they see, into a 'phonological' code: they watch someone speak, but with their inner ear they 'hear' his/her voice. Is this a requirement for good lipreading? Should we teach it? I'm not sure.
Some reading teachers say that children become better readers, if we teach them to recognize morphemes. I think that would work for lipreading, too: we should teach morphemes, instead of visemes.

... to be continued!

Lipreading is like reading...

Lipreading is like reading a handwritten text, on a moving, dynamic display that shows only two to three letters at a time.

Handwritten, so very personal. Some people have clear handwriting, other people's handwriting is sloppy. Some write large letters, others tiny. Even within a person, there are differences. If you're in a hurry, your handwriting deteriorates. If you write a text while you focus on the content - on what you want to write - instead of on the act of handwriting: your handwriting deteriorates.

In handwritten text, many letters are ambiguous when seen in isolation. Even whole words can be hard to recognize when taken out of context. Yet when you can see the entire sentence, letters and words are easy to read.

In printed text, letters are standardised - once you've got used to a certain font, each a looks the same, as does each b, c, and the rest of the alphabet. There's a recognizable space to indicate the end of one word, and the beginning of the next, and punctuation marks to indicate breaks and endings. In handwritten text, the letters are variable in form; their shapes are not standard, but influenced by the location in a word (beginning, middle, or ending) and the letters before and after. And words may run into each other.

To read handwritten text, we prefer good lighting, good paper, a good pen, and good contrast between paper and pen.

Last but not least: the content must be interesting, or we don't even bother to try. As an example, a picture of a handwritten note, that I found on the internet.

All of this applies to a lipreader trying to lipread a speaker: speech is personal, ambiguous, variable, and we lipreaders need all the help we can get - or we give up before we've even started.

What makes lipreading even more difficult than reading handwritten text, is that the display (the speakers head) moves. Very few speakers will look the lipreader in the eye for more than a few sentences. People look away because they are embarrassed by the lipreader's attention, because something catches their eye, because they have to think, or because they're just used to moving their head, body and hands while they speak.

Worst of all, though, is that the lipreader sees only two to three letters at a time. I'll try and make an animation to show what that looks like for reading handwritten text. For now, try and imagine reading the handwritten note above, through a moving window that lets you see just two or three letters at a time. Difficult? Yes!
Now increase the speed to 14 - 25 letters per second. Impossible? Yes!

Yet that's what lipreaders do.

Somehow.

Thursday, 3 May 2012

About this blog

January 1, 2012 was the starting date of the LipRead project, a EU project about lipreading.

Interested readers can find the official project info on the website: www.lipread.eu . Some of the info on the website has been translated into Dutch and Turkish. German translations are being made as I write.

The website is for the official info. The partners in the consortium use e-mail, for all internal discussions. We will send out a newsletter to interested outsiders, every now and then (people can register for the newsletter, on the website).

I am now starting this blog as yet another way to keep interested people informed of what we are doing. It will be a personal blog - what you read here, are my personal thoughts. Other partners in the consortium may not agree with me. And it's very much: thinking in progress.

I want to keep track somewhere of all the things that I read, hear, or think up myself. I will use a public blog to do this, because I hope to find as many people as possible, who can help us try and understand what lipreading is all about, and how it can or should be taught.

I have been e-mailing with a number of people about these things since January: partners in the consortium, people in Europe, and even someone from 'down-under'. I want to use this blog to share their thoughts (with their permission!) with others, and vice versa.
Very much 'crowd-sourcing', even though we are a pretty small crowd, spread thinly across the globe ;-)

I will write this blog in English, even though Dutch is my first language, to try and connect to as many people as possible, around the world. So please apologize my "Denglish" or "Eurish"!
I've asked the German partner in the project, Tanja Hubert, to start her own LipRead blog in German, for the colleagues in Germany. If she does, it will be her personal log with her thoughts about the project. But of course we will share and transfer and translate, whenever we want everyone's feedback.

I will not be able to work on this blog, every day, and the blog will not be in a diary format. The plan is to start a new page for each question that we are trying to answer (or deal with) in the LipRead project.

For instance:

is lipreading like learning a new language, or is it different?
what tools or methods or tests can we use, to evaluate the effectiveness of our materials?
is lipreading English easier, or more difficult than lipreading Dutch?

I'll add questions, as we go along. I'll add reports of my 'thinking in progress' to every page/question, whenever I've found new information, talked to people, or have another one of my 'eureka' experiences.

NB: I have these quite regularly, most are dudds, so they come with a warning: most have a very limited shelf-life ;-(

And yes: please, comment! In English if possible, in any other written language if necessary. I can use Google Translate to help me understand.

Liesbeth

www.lipread.eu
www.liplezer.nl
www.pragmaprojecten.nl

PS: for readers who are new to Blogger: you can respond to this blog (anonymously if you want) by clicking on the '0 comments' link at the bottom of each post:

this is just a picture, you'll find the real link further down