Welcome!

Welcome! Thank you for visiting the Video Voice Speech Training System blog. Our goal here is to provide a forum for sharing ideas about using this exciting speech development tool, learning about new enhancements to the program, and stimulating interaction between people who are already using Video Voice or who are considering it for their speech therapy needs.  Please join us and share your experiences, ask questions, or make suggestions for new features or capabilities. We're here to listen as well as talk!

To learn more about this innovative speech therapy aid or download a Free Trial, visit www.videovoice.com.

Showing posts with label speech models. Show all posts
Showing posts with label speech models. Show all posts

Thursday, June 6, 2013

Why No Built-In Models?

We recently had someone ask why we don't have any samples of correct production of "default speech models" built into Video Voice. Well, there are several reasons why we decided to let the therapist or another speaker be in charge of providing the models of target words, sounds, and phrases in Video Voice, rather than doing it ourselves.

First of all, do you know how many words there are in the English language alone? According to the Global Language Monitor, there are now more than 1,000,000 words in the English language, and a new one is coined approximately every 98 minutes! To define and produce models for even 10% of them would be a daunting task!

Video Voice Vowel Targets
F2/F1  pattens for "bit" (red)  and "beet"
Video Voice Cross-time Formant F2/F1 Display
Temporal display - Mechanical engineer
A second reason is that we don't know what speech issues any given child  (or adult) may have or what type of sounds or words that person needs to practice. Is it a small child who needs to learn basic production of sounds and simple words?  Or is it a non-native speaker who needs to learn the difference between two similar sounds such as /e/ and /I/ (/e/ as in “sheet metal”) or the appropriate rate/timing in a phrase such as “mechanical engineer”?  The target models and way to illustrate them is really up to the therapist to decide!

Third, there are significant dialectical differences across regions in this large country. "Normal" largely depends on where you happen to be located. A person from the northeast part of the country and a person in the south may pronounce the same word quite differently. Here's a link to a fascinating set of 22 national maps showing dialects' impact on speech that graphically illustrate some of these geographical differences.

Illustrating Dialectical Differences in "Yes
Take, for example, a simple word like “Yes,” and how differently it can sound coming from the mouths of celebrities Katie Couric and Paula Deen.  Katie would say “yEHs” while Paula’s vocalization would be more like “YAY-us.” Which is right and which is wrong? Really depends on where you are, doesn't it?  This screen image shows how these two productions would look in Video Voice (the shorter "yEHs" is in blue, "YAY-us" in red.)  Illustrating these kinds of differences makes the feedback effective for both accent reduction (ESL) or training (e.g., actors), as well as learning basic production of sounds.

Then there's the fact that people have different voice qualities. A man with a low-timbered voice and a woman with a higher-pitched voice would not tend to “score” well against each other's models, but their fundamental frequency differences will certainly be visible.  If a woman needs to lower her speaking pitch, or a man raise his, that can be accomplished with the visual feedback in Video Voice.

Therapists can define the models themselves, or have a speaker of the same age and voice quality do the voicing of target sounds and words for individuals in their caseloads. If you're working with children, for example, perhaps having the “cool kid” in the class be the one to create models could be a good strategy.  However, when the person receiving therapy does an accurate production of a desired sound/word, it's very easy, and desirable, to turn that production into the model. It's always easiest to match your own voice productions.

In conclusion, when looking at Video Voice for your speech needs, bear in mind that it is not a speech recognition program. It's a tool designed for training vocal production, one that illustrates sound and voice quality characteristics in various ways, offering an entertaining and motivating framework for learning, practicing, and improving speech skills.

Yours in good speech,

Video Voice Support Team
mv@videovoice.com
1-800-537-2182

Thursday, February 23, 2012

Where R You?

Got any kids with 'R' problems in your caseload? Yeah, I thought so. From what we hear from speech-language pathologists, most everyone has at least one with that pesky and tough-to-correct speech issue. Video Voice has a number of games and displays that can help, and the one I'm currently excited about is the Formant Multi-Frequency Spectral Display included in the last release (V3.0.127). This new display uses live feedback to help speakers learn about production of 'R' (and other sounds, too). With it they can quickly learn how changes in articulator position make all the difference in what sound they're producing.

Off hand, I can't think of a single, commonly-used word that more simply defines the problem of 'R' and its oft-confused 'OO' than “were.” It's a pure combination of those two sounds, and producing it correctly requires subtle and largely invisible change in tongue position, or “bunching,” to move smoothly between them. The immediate feedback in the Spectral display can be powerful in illustrating when this is, or is not, happening, and it provides a facility for practicing and learning the differences in production of the two.

Let's take a closer look. Now, I'm going to assume that you have either purchased Video Voice or have downloaded the Free Trial (at www.videovoice.com) to explore what it offers for therapy, and do a little follow-along to give you some basis on how this display works. (And if you haven’t downloaded the trial, why not? There’s absolutely no charge or ongoing obligation!)

Start by accessing the Spectral Display from the Formant Menu (Multi-Frequency-Spectral).

First, lets do some practice voicing. Click Start to activate the display, then vocalize. Say “were,” slowly, focusing on the F2 area in particular. You'll see very little blue when you're saying the 'woo' part of the word, but a great deal more when you hit (and sustain) the 'er'.

Also notice that as you speak, you see movement in all three Formant frequency ranges, and also a “trace” line above them. This is Video Voice averaging and smoothing the speech data into a single line as you speak, and it will be important as we go through this exercise.

The feedback is instantaneous! And, like speech, it's also fleeting. When you stop  vocalizing, the visual disappears. So we need a way to freeze a target so the feedback becomes more concrete.

Say “were” slowly again, sustaining the 'r'. As you see the blue F2 region and associated trace line expand upward with that sound, click the Use button. Video Voice will draw and hold a light trace line showing what the frequencies in the F1, F2 and F3 regions were at the point where you captured the sound with Use.  Now you have a “model” of the desired sound.

Now, try saying “were” again with the model trace on the screen. Say it slowly so you can see how the 'woo' and 'r' look, as well as the transition between the sounds. When you reach the 'r' part of the word, the blue F2 area should move up and touch the trace line.

The 'oo' in "were"
The 'r' in "were"

Pretty darned cool, isn't it?  It's a great way to practice sounds that are similar, but differ in important ways.

By the way, there's no scoring in this display, but you can always click the Reward button to activate a graphic animation when the speaker has reached the goal of producing that 'R' sound consistently.

And, if you find having all three frequencies shown at once confusing for this or any other sound, you can restrict the display and show only the most relevant one(s) by clicking the “ON” label(s) below the F1, F2 or F3 ranges. It will change to OFF, and you’ll no longer see that area of the display.

The live nature of this display makes it most useful for sounds that can be sustained, of course. In addition to 'R' and 'OO,' you'll also see big differences between sounds like ‘S’ and ‘SH,’ particularly in the F2 and F3 ranges. Give it a try!

We hope you have fun experimenting with this and other Video Voice displays and games to see how they can assist with your 'R' cases, as well as other speech problems in your caseload.

Yours in good speech,

Video Voice Support Team
mv@videovoice.com
1-800-537-2182
www.videovoice.com

Monday, August 8, 2011

Models for Speech

People sometimes wonder why Video Voice has no preprogrammed models of target sounds and words. There are actually very good reasons for that.

To start with, we don't know what targets any individual needs to work on. Consider the number of words that exist. In the English language alone, there are at least a quarter of a million words, according to the Oxford English Dictionary. And that's just English. Video Voice's displays are language-independent, and can easily be used for speech therapy in Spanish, French, Arabic or most any other language.

Even if we were to put together libraries of target models, there are other issues. There are usually distinct differences in male vs. female, adult vs. child voices. It's difficult-to-impossible to strip out the pitch elements from sounds, so a man may not be able to match a woman's model, nor a child an adult's voice.

Then there's the matter of dialectical differences. What constitutes the “correct” production of any sound? In the northeast part of the U.S. a word like “bet” is pronounced with a short vowel (“beht”). In the south, the vowel is often elongated to a diphthong, e.g., “bay-uht.” Which way is the right way? Well, that really depends on where you're living, doesn't it? To Video Voice, however, those two pronunciations won't "look" the same.

In the F2/F1 Formant Matrix representation, you can see the addition of the extra vowel sound in the red pattern ("Bay-uht"). The "ay" appears higher and more toward the left of the screen space. The durational differences in the sounds aren't strongly noticeable in this display, although the changes in the vowel sounds are.

If you switch to the F2/F1 Temporal representation for a cross-time view of the productions, the longer duration of "Bay-uht" is obvious. In both cases, you can see why the two words don't look the same to Video Voice, and why pre-programming models for use could end up being frustrating for users.

Model Libraries 

Although there are no built-in models, Video Voice does provide a structure in which you can assemble your own sets of target models for your caseload, creating your very own model library. Once you’ve defined and stored them, they’re available for repeated use, or for transfer to other folders.

Within the Authorized User operating mode, each therapist can have up to 255 folders. Each folder can contain up to 255 models. That's a total of 65,025 separate models, more than you’re ever likely to need. (I sure hope you don't have 255 individuals in your caseload!) Video Voice will allow as many as 255 separate therapist folders, which means you could have as many as 16,581,375 models stored, but I think we can agree that's just plain silly.

So let's be serious. This caseload structure means you can easily build libraries of models, which may be useful if you’re a school-based therapist, especially if a number of your students have similar speech problems, such as articulation of /r/. By building a library of models featuring that sound in different positions, you’ll have a source for targets that you can use to set up directories individualized for each student. And then you’ve got a therapy tool customized to your particular needs.

How do you go about this? Start by setting up a Therapist folder to contain the library, giving it a name such as MODLIB. Then, within that folder, define a caseload folder with a name like R Sounds. Activate the R Sounds folder with New Session, then use Formant Create to make a series of models with R: run, race, round, etc. (Make sure you clearly enunciate/stress the R during production so that it will be clearly visible in the patterns.)

You can repeat the process for S Sounds, Vowel Sounds, or whatever other targets you commonly work with, until you have a series of MODLIB folders containing your models.

Once you have built your library, you can transfer copies of any target models to any student's folder. Start by activating the desired student's folder with New Session. Then, go to the Data Management Copy Data function. Specify your MODLIB folder as the target source, and select and move desired models into the student's destination folder. Easy!

If you have students who share common therapy goals or articulation problems, you can also use this same strategy with their own data folders, treating them as a source for targets. Model libraries can streamline setup time, especially if you have a large caseload.

A final note. During therapy, consider library models to be only starting points. It’s always easiest to match your own voice, so when a student successfully produces a target, it’s a good idea to turn his voice pattern into the model. All it takes is a single click of the Replace (or SaveAs) button.

Model libraries are a good way to maximize your productivity with Video Voice and customize activities for everyone in your caseload. Why not give it a try?

Video Voice Support Team
1-800-537-2182
mv@videovoice.com
http://www.videovoice.com/