Book Excerpt: Creating Raw Media (Audio and Video)

Welcome to Pete Brown's 10rem.net

First time here? If you are a developer or are interested in Microsoft tools and technology, please consider subscribing to the latest posts.

You may also be interested in my blog archives, the articles section, or some of my lab projects such as the C64 emulator written in Silverlight.

(hide this)

Book Excerpt: Creating Raw Media (Audio and Video)

Pete Brown - 22 June 2010

What follows is a raw excerpt (before copy editing or final tech reviews) from chapter 20 in my book, Silverlight in Action. This is just a small portion of chapter 20, which covers the media element, streaming, using IIS Smooth streaming, using the Silverlight Media Framework, creating raw video and audio, and using the webcam and microphone APIs. I've blogged bits and pieces on this topic before, but it's such a fun topic, I thought it would be great to provide the whole raw video/audio generator here in one posting.

Note that in the print edition, the #A #B etc. letters are replaced by graphic cueballs and the numbered ones (#1, #2 etc.) are replaced by inline side-note text in the listing

I'm feverishly working to wrap the book up; my initial chapters will all be turned over for final editing before July 1. This has been an exciting (and long <g>) book to write. I hope you enjoy it as much as I have. :)

20.6 Working with raw media

Silverlight has a strong but finite set of codecs it natively supports for audio and video playback. If you want to use a format not natively supported, such as the WAV audio file format, or the AVI video format, you had no choice, until the Media Stream Source (MSS) API was added.

The MSS API was included in Silverlight 2, but that version required you to transcode into one of the WMV/WMA/MP3 formats natively supported by Silverlight. In Silverlight 3, the MSS API was augmented to support raw media formats where you send the raw pixels or audio samples directly through the rest of the pipeline. This made its use much easier as it required knowledge only of the format you want to decode. For the same reason, it is more performant, as an extra potentially CPU-intensive encoding step is avoided.

The MediaStreamSource API supports simultaneous video and audio streams. In this section, we'll look at creating raw video as well as raw audio. In both cases, we'll use algorithmically derived data to drive the raw media pipeline.

20.6.1 A Custom MediaStreamSource Class

To implement your own custom stream source, derive a class from MediaStreamSource. As the name suggests, this class will be used as the source for a MediaElement on the page. Table 20.10 shows that MediaStreamSource has several methods that you must override in your implementation.

Table 20.10 MediaStreamSource Virtual Methods

Method	Description
SeekAsync	Sets the next position to be used in GetSampleAsync. Call ReportSeekCompleted when done.
GetDiagnosticAsync	Used to return diagnostic information. This method can be a no-op as it is not critical. If used, call ReportGetDiagnosticCompleted when done.
SwitchMediaStreamAsync	Used to change between configured media streams. This method can be a no-op as it is not critical. If used, call ReportSwitchMediaStreamCompleted when done.
GetSampleAsync	Required. Get the next sample and return it using ReportGetSampleCompleted. If there is any delay, call ReportGetSampleProgress to indicate buffering.
OpenMediaAsync	Required. Set up the metadata for the media and call ReportOpenMediaCompleted.
CloseMedia	Any shutdown and cleanup code should go here

One thing you'll notice about the functions is that many of them are asynchronous. The pattern followed in those methods is to perform the processing and then call a ReportComplete method, the name of which varies by task, when finished.

The async nature of the API helps keep performance up, and keeps your code from slowing down media playback.

Listing 20.4 shows the skeleton of a MediaStreamSource implementation, including the methods described above. We'll continue to build on this throughout the remaining raw media sections.

Listing 20.4 The Basic MediaStreamSource Structure

public class CustomSource : MediaStreamSource
{
  private long _currentTime = 0;
  protected override void SeekAsync(long seekToTime)
  {
    _currentTime = seekToTime;
    ReportSeekCompleted(seekToTime);
  }

  protected override void GetDiagnosticAsync(
                    MediaStreamSourceDiagnosticKind diagnosticKind)
  {
    throw new NotImplementedException(); #1
  }

  protected override void SwitchMediaStreamAsync(
                    MediaStreamDescription mediaStreamDescription)
  {
    throw new NotImplementedException(); #1
  }

  protected override void GetSampleAsync( #2
                    MediaStreamType mediaStreamType)
  {

    if (mediaStreamType == MediaStreamType.Audio)
      GetAudioSample();
    else if (mediaStreamType == MediaStreamType.Video)
      GetVideoSample();
  }

  protected override void OpenMediaAsync() { } #A
  protected override void CloseMedia() { }
  private void GetAudioSample() { } #B
  private void GetVideoSample() { } #B
}

#1 No-op methods

#2 GetSampleAsync

The most important methods for our scenario are the OpenMediaAsync method #A, and the two methods #B that are used to get the next sample. Those two methods are called from the GetSampleAsync method whenever an audio or video sample is requested.

Once we have the CustomSource class created, we'll need to use it as the source for a MediaElement on a Silverlight page. Listing 20.5 shows how to wire this up using XAML for the user interface and C# code for the actual wire-up.

Listing 20.5 Using a custom MediaStreamSource class

XAML

<Grid x:Name="LayoutRoot" Background="White">
  <MediaElement x:Name="MediaPlayer" #A
                AutoPlay="True"
                Stretch="Uniform"
                Margin="10" />
</Grid>

public partial class MainPage : UserControl
{
  public MainPage()
  {
    InitializeComponent();
    Loaded += new RoutedEventHandler(MainPage_Loaded);
  }

  CustomSource _mediaSource = new CustomSource(); #1

  void MainPage_Loaded(object sender, RoutedEventArgs e)
  {
    MediaPlayer.SetSource(_mediaSource); #B
  }
}

#1 Custom MediaStreamSource

In this listing, I first create a MediaElement #A that will span the size of the page, then assign the CustomSource instance to the source property #B using the SetSource method of the MediaElement. Once that has completed, the MediaElement is set to play and will start requesting samples from the CustomSource class.

Right now, our CustomSource class does not return any samples, so running the application would show nothing. We'll modify the class to return both video and audio, starting with video.

20.6.2 Creating raw video

Being able to create video from raw bits is pretty exciting - it opens up all sorts of scenarios from bitmap-based animation to custom video codecs. I first played with raw video when I created my Silverlight Commodore 64 emulator (mentioned in chapter 5). I tried a few different video presentation approaches before I settled on generating the video display in real-time as a 50fps MediaStreamSource video at 320x200.

For this video example, we're simply going to generate white noise, much like you see on an analog TV then the signal is lost. When complete, the application will look like figure 20.8. If you lived in the US prior to cable TV, this is what you saw after the national anthem finished playing.

Figure 20.8 The completed white noise video generator. When I was a boy, I used to imagine I was watching an epic ant battle from high overhead. Well, until I saw Poltergeist, which forever changed the nature of off-air white noise on the TV.

We'll start with the logic required to set up the video stream, and follow it up quickly with the code that returns the individual frame samples.

Setting up the video stream

When creating raw video, the first step is to set up the video stream parameters. The parameters include things such as the height and width of the frame, the number of frames per second, and the actual video format.

Silverlight supports a number of different video formats, each identified by a FourCC code. FourCC is a standard four character code that is used to uniquely identify a video format. In addition to all of the existing formats (for example, "H264" for h.264 video), two new formats were added specifically for use raw media and the MediaStreamSource API. Those are listed in table 20.11.

Table 20.11 Supported Raw Media FourCC codes in Silverlight

FourCC Code	Description
RGBA	Raw, uncompressed RGB pixels with an alpha component. Silverlight currently ignores the alpha component during processing.
YV12	YUV 12. This is a common media output format used in many codecs.

In the example in this section, we'll use the RGBA format to push raw pixels without any special processing or encoding. It's the easiest format to use, requiring no algorithm other than providing a single pixel with a single color. Listing 20.6 shows the video setup code for our simple white noise generator.

Listing 20.6 Setting up the Video Stream

private int _frameTime = 0;
private const int _frameWidth = 320, _frameHeight = 200;
private const int _framePixelSize = 4;
private const int _frameBufferSize = _frameHeight * _frameWidth * _framePixelSize;
private const int _frameStreamSize = _frameBufferSize * 100;
private MemoryStream _frameStream = new MemoryStream(_frameStreamSize);
private MediaStreamDescription _videoDesc;

private void PrepareVideo()
{
  _frameTime = (int)TimeSpan.FromSeconds((double)1/30).Ticks; #1

  Dictionary<MediaStreamAttributeKeys, string> streamAttributes =
                  new Dictionary<MediaStreamAttributeKeys, string>();

  streamAttributes[MediaStreamAttributeKeys.VideoFourCC] = #C "RGBA";
  streamAttributes[MediaStreamAttributeKeys.Height] = #D _frameHeight.ToString();
  streamAttributes[MediaStreamAttributeKeys.Width] = _frameWidth.ToString();
  
  _videoDesc = new MediaStreamDescription( #E
              MediaStreamType.Video, streamAttributes);
}

protected override void OpenMediaAsync()
{
  Dictionary<MediaSourceAttributesKeys, string> sourceAttributes =
              new Dictionary<MediaSourceAttributesKeys, string>();

  List<MediaStreamDescription> availableStreams = #A
              new List<MediaStreamDescription>();

  PrepareVideo();

  availableStreams.Add(_videoDesc); #A

  sourceAttributes[MediaSourceAttributesKeys.Duration] = #2

  TimeSpan.FromSeconds(0).Ticks.ToString(
              CultureInfo.InvariantCulture);

  sourceAttributes[MediaSourceAttributesKeys.CanSeek] = false.ToString();

  ReportOpenMediaCompleted(
            sourceAttributes, availableStreams); #B
}

#1 30 frames per second

#2 "0" is infinite time

Listing 20.6 shows two functions: OpenMediaAsync and PrepareVideo. They've been broken up that way because OpenMediaAsync will also need to support audio later in this section.

When the class is wired up to a MediaElement, Silverlight will first call the OpenMediaAsync function. In that function, you need to tell Silverlight what streams are available #A, a single video stream in this case. Then you need to set up some attributes for the duration of the video, infinite in our case, and whether or not you allow seeking. Finally, you take that information and pass it into the ReportOpenMediaCompleted method #B to tell Silverlight you're ready.

The PrepareVideo method sets up some variables that will be used when we generate the samples. First, we identify the amount of time per frame. This can vary over the course of the video, but it'll be easier on the developer if you pick a constant frame rate. Then we set up a dictionary of attributes that identify the format of the video #C, and the dimensions of each frame #D. Finally, that is all packed into a MediaStreamDescription #E to be used when we start generating frames.

Once the video stream is set up, the next thing to do is to start pumping out frames to be displayed.

Returning the sample

The main purpose of a MediaStreamSource implementation is to return samples. In the case of video, a sample is one complete frame, ready to be displayed. Listing 20.7 shows the GetVideoSample function, called by GetSampleAsync.

Listing 20.7 Returning the Video Frame Sample

private int _frameStreamOffset = 0;
private Dictionary<MediaSampleAttributeKeys, string> _emptySampleDict = 
           new Dictionary<MediaSampleAttributeKeys, string>();

private Random _random = new Random();
private byte[] _frameBuffer = new byte[_frameBufferSize];

private void GetVideoSample()
{
  if (_frameStreamOffset + _frameBufferSize > _frameStreamSize)
  {
    _frameStream.Seek(0, SeekOrigin.Begin); #1
    _frameStreamOffset = 0;
  }

  for (int i = 0; i < _frameBufferSize; i+= _framePixelSize)
  {
    if (_random.Next(0, 2) > 0)
    {
      _frameBuffer[i] = _frameBuffer[i + 1] =
      _frameBuffer[i + 2] = 0x55; #A
    }
    else
    {
      _frameBuffer[i] = _frameBuffer[i + 1] =
      _frameBuffer[i + 2] = 0xDD; #A
    }

    _frameBuffer[i + 3] = 0xFF; #2
  }

  _frameStream.Write(_frameBuffer, 0, _frameBufferSize); #B

  MediaStreamSample msSamp = new MediaStreamSample( #C
         _videoDesc, _frameStream, _frameStreamOffset,
         _frameBufferSize, _currentTime, _emptySampleDict);

  _currentTime += _frameTime;
  _frameStreamOffset += _frameBufferSize;

  ReportGetSampleCompleted(msSamp);

}

#1 Rewind when at end

#2 Alpha value 0xFF = Opaque

The GetVideoSample function first checks to see if we're approaching the end of the allocated video buffer. If so, it rewinds back to the beginning of the buffer. This is an important check to make, as you don't want to allocate a complete stream for every frame, but a stream cannot be boundless in size.

Once that is done, I loop through the buffer, moving four bytes at a time (the size of a single pixel in the buffer) and generate a random pixel value. The pixel will either be almost white or almost black. #A When playing with the sample, I found that pure black and white was far too harsh, and these two slightly gray values looked more natural. While not obvious here, when setting the pixel values you need to do so in Blue, Green, Red, Alpha (BGRA) order.

The next step is to write the buffer to the stream #B. In this simple example, I could have written the bytes directly to the stream and eliminated the buffer. However, in anything more complex than this, you are likely to have at least two buffers (a read-from and a write-to buffer), and even more likely to have a queue of frame buffers used for pre-loading the individual frames.

Once the stream is populated, I then create the media stream sample, increment our time counters and then call ReportGetSampleCompleted to return the sample to Silverlight.

One interesting note in this is how sample time is used rather than frame numbers. The use of a time for each frame allows Silverlight to drop frames when it starts to lag behind. This was a key reason I chose MediaStreamSource over other approaches in the Silverlight C64 emulator. When the user's machine is busy, or in case it is too slow to run the emulator at full frame rate, I continue to chug along and let Silverlight skip frames it doesn't have time to show. This helps keep everything in sync time-wise, very important when you're also creating audio.

20.6.3 Creating raw audio

In the previous section, we created a white noise video generator. Let's take that all the way and add in white noise audio. Surprisingly, audio is somewhat more complex to set up than video. This is due to the number of options available to you: audio can have different sample bit sizes, be mono or stereo, have different sample rates and more.

All this information is stored in a class known as WaveFormatEx. In order to fit the listing into this book, I'm going to use a greatly simplified, but still functional, version of this class. Listing 20.8 below shows the class. Create this as a separate class file in your project.

Listing 20.8 A simplified WaveFormatEx Structure

(listing omitted as it was shortened just for print. For a better version of this structure, visit the synthesizer page and download the source code)

The WaveFormatEx class is simply a way to specify the format to be used for PCM wave data in Silverlight. It's a standard structure, forming the header of the .WAV file format, which is why you get some oddities such as the big to little endian format conversions. The class-based version here includes a single helper utility function AudioDurationFromBufferSize which will be used when we output the PCM samples.

There are more complete implementations of WaveFormatEx to be found on the web, including one in my Silverlight Synthesizer project at . Those implementations typically include a validation function that makes sure all the chosen options are correct.

With that class in place, we'll turn our eye to the actual stream setup.

Setting up the wav media source

The first step in setting up the sound source is to modify the OpenMediaAsync function. That function currently includes a call to PrepareVideo followed by adding the video stream description to the list of available streams. Modify that code so that it also includes the audio description information as shown here:

...

PrepareVideo();
PrepareAudio();

availableStreams.Add(_videoDesc);
availableStreams.Add(_audioDesc);

...

Once those changes are in place, we'll add the PrepareAudio function to the class. The PrepareAudio function is the logical equivalent to the PrepareVideo function; it sets up the format information for Silverlight to use when reading our samples. Listing 20.9 shows the code for that function and its required class member variables and constants.

Listing 20.9 The PrepareAudio Function

private WaveFormatEx _waveFormat = new WaveFormatEx(); #1
private MediaStreamDescription _audioDesc;
private const int _audioBitsPerSample = 16; #A
private const int _audioChannels = 2; #A
private const int _audioSampleRate = 44100; #A

private void PrepareAudio()
{
  int ByteRate = _audioSampleRate * _audioChannels * 
                 (_audioBitsPerSample / 8); #B

  _waveFormat = new WaveFormatEx();

  _waveFormat.BitsPerSample = _audioBitsPerSample;
  _waveFormat.AvgBytesPerSec = (int)ByteRate;
  _waveFormat.Channels = _audioChannels;
  _waveFormat.BlockAlign =
          (short)(_audioChannels * (_audioBitsPerSample / 8)); #C
  _waveFormat.ext = null;
  _waveFormat.FormatTag = WaveFormatEx.FormatPCM;
  _waveFormat.SamplesPerSec = _audioSampleRate;
  _waveFormat.Size = 0; #2

  Dictionary<MediaStreamAttributeKeys, string> streamAttributes = 
          new Dictionary<MediaStreamAttributeKeys, string>();

  streamAttributes[MediaStreamAttributeKeys.CodecPrivateData] = 
          _waveFormat.ToHexString(); #D

  _audioDesc = new MediaStreamDescription(
          MediaStreamType.Audio, streamAttributes);

}

#1 WaveFormatEx

#2 Must be zero

Arguably the most important parts of this listing are the constants controlling the sample format #A. For this example, we're generating 16 bit samples, in two channels (stereo sound), at a sample rate of 44,100 samples per second: CD quality audio.

Once those constants are established, they are used to figure out almost everything else, including the number of bytes per second #B and the block alignment #C. Once the WaveFormatEx structure is filled out with this information, I set it as the Codec Private Data #D using its little-endian hex string format. Finally, I create the audio description from that data, to be used when reporting samples back to Silverlight.

Creating Sound Samples

The final step is to actually output the audio samples. This requires generating the individual samples and returning them in chunks of pre-defined size. We'll use a random number generator to generate the noise, much like we did with video. Listing 20.10 shows how to fill a buffer with audio and return those samples to Silverlight.

Listing 20.10 Outputting Audio Samples

private long _currentAudioTimeStamp = 0;
private const int _audioBufferSize = 256; #1
private const int _audioStreamSize = _audioBufferSize * 100;
private byte[] _audioBuffer = new byte[_audioBufferSize];
private MemoryStream _audioStream = new MemoryStream(_audioStreamSize);
private int _audioStreamOffset = 0;
private double _volume = 0.5;

private void GetAudioSample()
{
  if (_audioStreamOffset + _audioBufferSize > _audioStreamSize) #A
  {
    _audioStream.Seek(0, SeekOrigin.Begin);
    _audioStreamOffset = 0;
  }

  for (int i = 0; i < _audioBufferSize; #B
            i += _audioBitsPerSample / 8)

  {
    short sample = 
            (short)(_random.Next((int)short.MinValue, #2
            (int)short.MaxValue) * _volume);

    _audioBuffer[i] = (byte)(sample & 0xFF00); #C
    _audioBuffer[i + 1] = (byte)(sample & 0x00FF); #C
  }

  _audioStream.Write(_audioBuffer, 0, _audioBufferSize);

  MediaStreamSample msSamp = new MediaStreamSample( #D
          _audioDesc, _audioStream, _audioStreamOffset, _audioBufferSize,
          _currentAudioTimeStamp, _emptySampleDict);

  _currentAudioTimeStamp += 
          _waveFormat.AudioDurationFromBufferSize((uint)_audioBufferSize);

  _audioStream = new MemoryStream(_audioStreamSize);

  ReportGetSampleCompleted(msSamp); #E

}

#1 Internal buffer size

#2 Sample Randomizer

The process for generating the white noise audio sample is very similar to generating the frames of video. However, instead of having a fixed width x height buffer we must fill, we can generate as long or as short a sample as we want. This is controlled by the audio buffer size set in code. In general, you want this number to be as low as possible, as larger numbers typically introduce latency as well as skipped video frames as the system is too busy generating audio to show the video frame. Set the number too low, however, and the audio will stutter. If you find the white noise stuttering on your machine, up the buffer to 512 or so and see how that works for you.

TIP

To help with latency, you can also play with the AudioBufferLength property of the MediaStreamSource class. In most cases, you won't be able to get that below 30ms or so, but that value is itself very hardware dependent. That property is my own contribution to the class, as I was the only one insane enough to be writing a Silverlight-based audio synthesizer at the time. I ran into problem after problem with the triple-buffering (my buffer, plus Silverlight MSS buffer plus underlying DirectX buffer), to the point where all audio was delayed by about 2-3 seconds. The team worked with me to identify where the issues were, and then added this knob into the base class to help tweak for latency-sensitive applications like mine.

Once the buffer size is established, I perform the same stream overrun check #A that we did for video, and for the same reasons. Then, I loop through the buffer, two bytes (16 bits) at a time, and generate a white noise sample. Once the sample is generated, I get the two bytes from it using a little bit-masking #C, and then write those bytes into the buffer. Once the buffer is filled, it is copied into the stream and the sample response built #D. After incrementing the time counters, the last step is to report the sample to Silverlight.

If you run the application at this point, you should have a short delay while the startup code is executed and the Silverlight internal buffers filled, followed by simultaneous audio and video white noise. On the surface, this may not seem impressive. However, when you consider that the video and audio is completely computer generated, it is considerably more impressive.

Raw audio and video is also gateway to allow you to display any type of media for which you can write a decoder. Much of the IIS Smooth Streaming client for Silverlight, for example, is written using a custom MediaStreamSource implementation. While writing a typically hardware-implemented 1080p HD codec in managed code may not lead to good performance, there are many other popular formats which don't have native Silverlight support, but which would benefit from a custom MediaStreamSource implementation.

So far, we've seen a number of ways to get video and audio into Silverlight. The easiest, of course, is to use a video format Silverlight supports and just point the MediaElement to it. Another way was to use the MediaStreamSource class to implement your own managed codec. One final way to get video and audio into Silverlight is to use the webcam and microphone APIs. A segment of the API, especially the VideoSink and AudioSink classes is conceptually similar to the MediaStreamSource code we've completed in this section, but thankfully much simpler. We'll cover those in the next section.

If you enjoyed this excerpt, please be sure to order a copy of my book. My book is set to be published later this summer, but you can receive copies of the chapters as they are completed by signing up for the Manning Early Access Program (MEAP).

.NET Silverlight Synthesis Silverlight in Action Video Tutorial General Silverlight

posted by Pete Brown on Tuesday, June 22, 2010
filed under: .NET Silverlight Synthesis Silverlight in Action Video Tutorial General Silverlight

4 comments for “Book Excerpt: Creating Raw Media (Audio and Video)”

Reply
Walt Ritschersays:
Wednesday, June 23, 2010 at 2:05:52 AM
If the rest of your book is as good as this chapter, I'll be adding it to my bookcase. Excellent, detailed and useful.
Reply
Peter Brombergsays:
Sunday, July 18, 2010 at 1:43:01 PM
Pete, this is great stuff, as is almost everything I've seen you put out. Microsoft is lucky to have you.
Reply
Muhammad Qasimsays:
Sunday, February 24, 2013 at 3:25:34 AM
Sir, kindly provide the full source code on same topic so that we can learn the technology i sahll be great thankful to you
Reply
Petesays:
Monday, February 25, 2013 at 4:27:40 AM
@Muhammad

You can get everything in my Silverlight 5 book (an updated version of what is covered here).
http://manning.com/pbrown2

I have another, much older post, on this topic:
http://10rem.net/blog/2009/03/23/creating-sound-using-mediastreamsource-in-silverlight-3-beta

You can also see the same concepts from here applied in my simple synthesizer project. Source can be downloaded from the project page.
http://10rem.net/lab/silverlight-synthesizer

Pete