Welcome to Pete Brown's 10rem.net

First time here? If you are a developer or are interested in Microsoft tools and technology, please consider subscribing to the latest posts.

You may also be interested in my blog archives, the articles section, or some of my lab projects such as the C64 emulator written in Silverlight.

(hide this)

Using Speech Synthesis in .NET 4 and Windows 7

Pete Brown - 16 December 2009

image Computer voice synthesis has come a very long way since the old robotic voices like we heard on Wargames ("Shall we play a game?" in what sounded like slightly British overtones) and other voice synths of the time. From automated voice response to the little GPS receivers in our cars, we expect to hear pretty decent human-sounding words, with gender tones and even accents.

Many of us never consider adding voice synthesis or voice recognition to our own programs. However, in certain applications, it can make a huge difference in usability. Consider a computer located in your kitchen, used to store recipes and numerous other domestic "things". Being able to have a computer recognize your verbal request to read the next step in the recipe, and respond in kind means no crumb-topping fingerprints on the screen and no cake batter in a keyboard.

A little History

Not going all the way back to the Commodore 64 speech synthesizer program I typed in from a magazine in seventh grade (yes, some of the first things we made the computer say were pretty rude <g>), modern speech synthesis on Windows started with SAPI. I was interested in this way back, and had this huge tome "MAPI, TAPI, SAPI." I never got too far with it, as many of the APIs weren't overly friendly to Visual Basic back in earlier versions of SAPI, and their exploration at the time wasn't worth me delving back into C++ on Windows.

While there were third-party programs that did voice synthesis on earlier versions of Windows, the first version of SAPI came with Windows 95 and Windows NT 3.51.

Speech Synthesis Today

SAPI 5.3 added a ton of new functionality and also became part of the core operating system (Windows Vista and Windows 7, in this case). You can use this API natively or via COM interop. A version of SAPI is also available on Windows Mobile devices.

In .NET, it couldn't be easier to make your computer talk. Starting with .NET 3.0 on Windows XP and higher, you can use the managed speech API System.Speech.Synthesis to identify voices and speak. I briefly wrote about this a couple years ago, but felt it deserved some revisiting with the new Microsoft Anna voice (Introduced with Vista), Windows 7 and .NET 4.

Trying out the installed voices

From control panel, search for "speech" and pull up the Speech Recognition applet

image

On the text to speech tab, see what voices you have installed. In my case, I have only Microsoft Anna. Depending upon your locale and what other products you have, your installed voices may be different. For example, you may have Microsoft Lili which is intended for Chinese versions of Windows. You can also download some older speech voices with their associated engines here. (Older Mike and Mary voices also available as part of the 5.1 SDK download here.) The old voices are interesting for historical purposes, but Anna sounds much better. If you're really hankering to try out some other voices, this site is the mother lode of all voice sites, with mp3 and wave samples for most.

image

You can try out the voice by clicking the Preview Voice button after optionally changing the text in the textbox. Make sure this works before proceeding on through the rest of this example.

If you go to that nextup.com site linked above, scroll down to the Cepstral Character Voices section. Amusing :) (however, having your computer whisper to you is super creepy)

Using System.Speech.Synthesis

Let's get into programming against the speech synthesizer using the .NET speech API.

Project Setup

As is usual in these demos, create a new Windows Application (WPF) in Visual Studio 2010. We'll also add a folder called "ViewModels" and a single class named "SpeechDemoViewModel". We're going to do a little binding and I want to make sure you're set up with a reasonable architecture to build on should you decide to extend the demo.

image

You'll also need to add a reference to System.Speech.dll (v4)

Main Window

Our Main Window is pretty simple. We'll have a listbox to list all the available voices, a textbox you can type into, and a button that causes the typed text to be read aloud by the speech engine.

<Window x:Class="WpfSpeechTest.MainWindow"
        xmlns="http://schemas.microsoft.com/winfx/2006/xaml/presentation"
        xmlns:x="http://schemas.microsoft.com/winfx/2006/xaml"
        Title="Text to Speech Demo"
        Height="300"
        Width="530">
    <Grid>
        <Grid>
            <Grid.ColumnDefinitions>
                <ColumnDefinition Width="*" />
                <ColumnDefinition Width="2*" />
            </Grid.ColumnDefinitions>

            <ListBox x:Name="AvailableVoicesList"
                     Grid.Column="0" />

            <StackPanel Orientation="Vertical"
                        Grid.Column="1">
                <TextBlock Text="Text to Speak"
                           Margin="5 5 5 1" />
                <TextBox x:Name="TextToSpeak"
                         Margin="5 1 5 5"
                         Height="100"
                         TextWrapping="Wrap" />
                <Button x:Name="Speak"
                        Margin="5"
                        Width="75"
                        HorizontalAlignment="Right"
                        Content="Speak" />
            </StackPanel>
        </Grid>
    </Grid>
</Window>

image

Enumerating Available Voices via our ViewModel

The Speech API supports a pluggable voice model. You can download voices from third parties, or use one of the built-in voices. That means the set of voices is can vary from system to system - you'll probably want to allow the user to select the voice they want to hear.

In our SpeechDemoViewModel class, let's add what we need to enumerate the voices. We'll be using the System.Speech.Synthesis.SpeechSynthesizer class to get a collection of all installed voices. The code for that is in the LoadVoices function below. There's also an exposed collection of voices, which we'll use in the listbox, and a SelectedVoice property which will also be bound to the listbox.

using System;
using System.Collections.ObjectModel;
using System.ComponentModel;
using System.Linq;
using System.Speech.Synthesis;

namespace WpfSpeechTest.ViewModels
{
    class SpeechDemoViewModel : INotifyPropertyChanged
    {
        SpeechSynthesizer _synth = new SpeechSynthesizer();
        
        // collection of all installed and enabled voices on this computer
        ObservableCollection<InstalledVoice> _availableVoices = new ObservableCollection<InstalledVoice>();
        public ObservableCollection<InstalledVoice> AvailableVoices
        {
            get { return _availableVoices; }
        }

        // the currently selected voice
        private InstalledVoice _selectedVoice;
        public InstalledVoice SelectedVoice
        {
            get { return _selectedVoice; }
            set { _selectedVoice = value; NotifyPropertyChanged("SelectedVoice"); }
        }

        // function to load up all available voices
        public void LoadVoices()
        {
            var voices = _synth.GetInstalledVoices();

            foreach (InstalledVoice v in voices)
            {
                if (v.Enabled)
                    _availableVoices.Add(v);
            }
            
        }

        
        // ViewModel change notification goo
        public event PropertyChangedEventHandler PropertyChanged;
        protected void NotifyPropertyChanged(string propertyName)
        {
            if (PropertyChanged != null)
                PropertyChanged(this, new PropertyChangedEventArgs(propertyName));
        }
    }
}

Next, we need to wire this up to the view. The first thing is to hook the ViewModel into our xaml. As I've mentioned in previous posts, there are a number of different ways to do this, some more robust than others. The way I use here is for simplicity. I set up a viewModels xml namespace and then assign the viewmodel directly to the data context of this Window.

<Window x:Class="WpfSpeechTest.MainWindow"
        xmlns="http://schemas.microsoft.com/winfx/2006/xaml/presentation"
        xmlns:x="http://schemas.microsoft.com/winfx/2006/xaml"
        xmlns:viewModels="clr-namespace:WpfSpeechTest.ViewModels"
        Title="Text to Speech Demo"
        Height="300"
        Width="530">
    <Window.DataContext>
        <viewModels:SpeechDemoViewModel />
    </Window.DataContext>

Now we need a way to call the LoadVoices method in our viewmodel. You could simply call that from the viewmodel's constructor, but since this is a non-trivial method and could possibly throw an exception, I recommend against it. Instead, we'll call the vm's method in the main window's loaded event handler.

public partial class MainWindow : Window
{
    public MainWindow()
    {
        InitializeComponent();

        Loaded += new RoutedEventHandler(MainWindow_Loaded);
    }

    void MainWindow_Loaded(object sender, RoutedEventArgs e)
    {
        ((SpeechDemoViewModel)this.DataContext).LoadVoices();
    }
}

If you run it now, you'll see that we have one voice (in my case) installed, but the listbox doesn't show us anything useful. It simply does a ToString() on the class which in turn just returns the name of the type.

image

Let's remedy that by creating an item template. The AvailableVoice class, in its VoiceInfo property has a number of interesting properties that we can tap into.

<ListBox x:Name="AvailableVoicesList"
         ItemsSource="{Binding AvailableVoices}"
         Grid.Column="0">
    <ListBox.ItemTemplate>
        <DataTemplate>
            <Grid Margin="5">
                <Grid.RowDefinitions>
                    <RowDefinition Height="Auto" />
                    <RowDefinition Height="Auto" />
                    <RowDefinition Height="Auto" />
                    <RowDefinition Height="Auto" />
                </Grid.RowDefinitions>


                <TextBlock Grid.Row="0"
                           Text="{Binding VoiceInfo.Id}" />

                <TextBlock Grid.Row="1"
                           Text="{Binding VoiceInfo.Name}" />
                
                <TextBlock Grid.Row="2"
                           Text="{Binding VoiceInfo.Description}"
                           TextWrapping="Wrap" />

                <TextBlock Grid.Row="3"
                           Text="{Binding VoiceInfo.Gender}" />
            </Grid>
        </DataTemplate>
    </ListBox.ItemTemplate>
    
</ListBox>

When run on my machine, this is what I see

image

Not beautiful, but at least it tells us more information about what voices are available. Two final things I'd like to do with the listbox: have it update the SelectedVoice property of the viewmodel and make it automatically select the first item automatically.

The first is handled via finding the SelectedItem to the SelectedVoice property of our viewmodel.

<ListBox x:Name="AvailableVoicesList"
         ItemsSource="{Binding AvailableVoices}"
         SelectedItem="{Binding SelectedVoice, Mode=TwoWay}"
         Grid.Column="0">

The second is handled in the viewmodel itself. Update the LoadVoices method to set the SelectedItem automatically:

// function to load up all available voices
public void LoadVoices()
{
    var voices = _synth.GetInstalledVoices();

    foreach (InstalledVoice v in voices)
    {
        if (v.Enabled)
            _availableVoices.Add(v);
    }

    if (_availableVoices.Count > 0)
        SelectedVoice = _availableVoices[0];
    else
        SelectedVoice = null;
    
}

Now when you run it, you'll see the ListBox has the first item selected by default. That's one pattern I really like to follow when using the vm: set selected items via binding on the UI and explicit assignment within the viewmodel.

image

Making it Talk

We have most of the wiring in place. One thing we'll need to revisit, though, is our SelectedVoice property. The property works great for handling the listbox, but it should also select the voice on the SpeechSynthesizer in our viewmodel. We could do this in the Speak method we create below, but then we'd be calling it every time, when in fact we only need to call this method when a new voice is selected.

// the currently selected voice
private InstalledVoice _selectedVoice;
public InstalledVoice SelectedVoice
{
    get { return _selectedVoice; }
    set 
    { 
        _selectedVoice = value;
        _synth.SelectVoice(value.VoiceInfo.Name);
        NotifyPropertyChanged("SelectedVoice"); 
    }
}

The next step is to wire up the Text to Speak textbox and the Speak button. First, we'll need to add some code to our viewmodel. Specifically, we'll need a property that holds the text to speak, and we'll need a method that will be invoked when the user clicks the speak button.

In the viewmodel, add a TextToSpeak property. I also gave it some default text so we don't have to type every time we run the application.

// this is the text the engine will speak
private string _textToSpeak = "Hello, this is a test of text to speech in Microsoft Windows.";
public string TextToSpeak
{
    get { return _textToSpeak; }
    set { _textToSpeak = value; NotifyPropertyChanged("TextToSpeak"); }
}

Now we need a Speak method. Also in the viewmodel, add a new function named Speak that looks like this:

public void Speak()
{
    if (_selectedVoice != null && !string.IsNullOrEmpty(_textToSpeak))
    {
        _synth.Speak(TextToSpeak);
    }
    else
    {
        // no voice or no text. do nothing
    }
}

All this function does is verify we have a selected voice and some selected text, and then uses the speech engine to say it. The code couldn't be simpler for basic text to speech.

Wiring to our View

Two more steps and we'll be ready to talk. The first is to bind the text in the viewmodel to the textbox in the view. The second is to wire the Speak button to our viewmodel.

We'll use simple two-way binding to get the text back and forth between the view and the viewmodel.

<TextBox x:Name="TextToSpeak"
         Text="{Binding TextToSpeak, Mode=TwoWay}"
         Margin="5 1 5 5"
         Height="100"
         TextWrapping="Wrap" />

For the Button, we have a couple choices. We can do an event handler, or we can use commands. First, let's do it with an event handler so you can get to the speech part. We'll revisit with a command afterwards.

Using Events

In the xaml for the view, add a new event handler for the button, letting Visual Studio create the stub in the code-behind.

<Button x:Name="Speak"
        Margin="5"
        Width="75"
        HorizontalAlignment="Right"
        Click="Speak_Click"
        Content="Speak" />

And in the code-behind add this one-liner to the handler:

private void Speak_Click(object sender, RoutedEventArgs e)
{
    ((SpeechDemoViewModel)this.DataContext).Speak();
}

When you run the app, and click the "Speak" button, you should hear the text spoken using the selected voice.

Wiring to our View using a Command - Extra Credit

The event was dirt-simple to put together. Unfortunately it introduces more code in the code-behind, which can get more and more complex as you have more events to handle. It also doesn't take care of any nice things like automatically disabling the button when the command is not in a state where it can be executed (for example, no voice selected or no text entered)

Let's use the WPF commanding system to take care of that. The first thing we'll need to do is create a RelayCommand class. I snagged the code for this from Josh Smith's WPF MVVM MSDN article.

Create a new class named RelayCommand in the root of the project.

public class RelayCommand : ICommand
{
    readonly Action<object> _execute;
    readonly Predicate<object> _canExecute;

    public RelayCommand(Action<object> execute)
        : this(execute, null)
    {
    }

    public RelayCommand(Action<object> execute, Predicate<object> canExecute)
    {
        if (execute == null)
            throw new ArgumentNullException("execute");

        _execute = execute;
        _canExecute = canExecute;
    }

    public bool CanExecute(object parameter)
    {
        return _canExecute == null ? true : _canExecute(parameter);
    }

    public event EventHandler CanExecuteChanged
    {
        add { CommandManager.RequerySuggested += value; }
        remove { CommandManager.RequerySuggested -= value; }
    }

    public void Execute(object parameter)
    {
        _execute(parameter);
    }
}

Once you have that set up (the class is reusable for any number of commands you expose) you can use it in the viewmodel like this:

private RelayCommand _speakCommand;
public ICommand SpeakCommand
{
    get
    {
        if (_speakCommand == null)
        {
            _speakCommand = new RelayCommand(param => this.Speak(),
                param => this.CanSpeak);
        }
        return _speakCommand;
    }
}

public bool CanSpeak
{
    get { return _selectedVoice != null && !string.IsNullOrEmpty(_textToSpeak); }
}

public void Speak()
{
    if (CanSpeak)
    {
        _synth.Speak(TextToSpeak);
    }
    else
    {
        // no voice or no text. do nothing
    }
}

Note that in the Speak() method, I changed the guard code to call CanSpeak.

Now we just need to wire it up inside the xaml. If you tried the event version earlier, remove the click event handler from both the xaml and the code-behind; you won't need that any longer.

The way you'll wire this up is to bind to the SpeakCommand property exposed by our viewmodel.

<Button x:Name="Speak"
        Margin="5"
        Width="75"
        HorizontalAlignment="Right"
        Command="{Binding SpeakCommand}"
        Content="Speak" />
 

Now go ahead and run it and verify it works as expected. While you're at it, try removing all the text from the textbox and tab off and see what happens. You'll notice that the button is only disabled once you tab off of the textbox. The reason is the TextToSpeak property of the viewmodel is only being updated when the user tabs off the textbox. If you want the validation to happen as they type, you'll need to change the binding just a little

<TextBox x:Name="TextToSpeak"
         Text="{Binding TextToSpeak, Mode=TwoWay, UpdateSourceTrigger=PropertyChanged}"
         Margin="5 1 5 5"
         Height="100"
         TextWrapping="Wrap" />

 

Setting the binding to update on PropertyChanged causes a lot more assignments in your code (each time the user changes the text, a character or selection at a time) the property in your viewmodel is updated. So use this when you need to, but understand that's potentially a lot of assignments happening behind the scenes.

Conclusion

I hope you enjoyed our first foray into speech synthesis with .NET 4 and Windows 7. In a future installment, we'll check out speech recognition, the counterpart to synthesis.

A walkthrough video and full source code will be available on windowsclient.net soon. I'll post the links here when the files are up.

Update: Source code and video may be found here.

         
posted by Pete Brown on Wednesday, December 16, 2009
filed under:          

12 comments for “Using Speech Synthesis in .NET 4 and Windows 7”

  1. Tedebussays:
    Hi Pete,
    thank you for your reply.
    My OS is Win7x86. After two days I found a solution that I think may be useful for someone is on the same situation: the matter was in a corrupt key on registry. I found voices settings regkeys, I removed them after a backup and I tried to restore them again and... System told me that a key was not installable because already used! So the copy continued without it (remaining keys) and now all is working fine.
  2. MawashiKidsays:
    Hi Pete,

    I tried the following code in small WPF app in order to get a list
    of all installed voices on Windows 8 Pro.

    ************************************************************

    SpeechSynthesizer synth = new SpeechSynthesizer();

    // Load all installed voices

    System.Collections.ObjectModel.ReadOnlyCollection<InstalledVoice>

    voices = synth.GetInstalledVoices();

    foreach (InstalledVoice voice in voices)

    {

    VoicesComboBox.Items.Add(voice.VoiceInfo.Name);

    }
    ********************************************************************

    Now when running the code above, the only 3 voices name appearing in the ComboBox are

    Microsoft David Desktop
    Microsoft Hazel Desktop
    Microsoft Zira Desktop

    which I believe are the only 3 available on Windows 8 by default.

    Now I need to add a French voice to read French content text and I was wondering if [and how] that can be done.
    I've installed MSSpeech_TTS_fr-CA_Harmonie.msi on Windows 8 Pro though it didn't seem to have any effect,
    no matter what, it just won't add up in the list... Hmmmm...

    Any clues on how to fix this would be much appreciated.

    Thank's
  3. Paul Shepleysays:
    I have the same problem that MawashiKid does. My system Windows 7 only seems to have Anna installed on it and I can't install any additional languages or voices that will be recongnized by GetInstalledVoices(). I have looked pretty well everywhere on how to install voices. Run the install and they will exist.

  4. Scott Chiusays:
    This is a very useful sample.

    I tried it in Windows 7 and it worked well.

    For the problem about extra voices mention aboved:

    As far as I know, if you are using Enterise or Ultimate version, additional voices will be added if extra langauge packs are installed.

    For professional version users, you can only add extra languages which are based on your current installed langauge.

    You can refer to http://windows.microsoft.com/en-US/windows/language-packs#lptabs=win7

Comment on this Post

Remember me

4 trackbacks for “Using Speech Synthesis in .NET 4 and Windows 7”

  1. uberVU - social commentssays:
    This post was mentioned on Twitter by davidezordan: Using Speech Synthesis in .NET 4 and Windows 7 http://bit.ly/4VK3tA (via @Pete_Brown) #in #fb
  2. POKE 53280,0: Pete Brown's Blogsays:
    When working through the Speech Synthesis post last night , I kept getting a nagging feeling that the
  3. Windows Client Newssays:
    In this session, Pete Brown shows us how to use Windows 7 text-to-speech facilities in .NET 4 to make
  4. Windows Client Newssays:
    In this session, Pete Brown shows us how to use Windows 7 text-to-speech facilities in .NET 4 to make