Introduction

Audio analysis is hard. Recently, I needed a small code sample to parse a .wav file but couldn’t find anything that was working. I learned the hard way that there are a lot of combination of structures that we can take into account.

In this article we will not discuss what each header does and how to use them. There are a lot of articles about that, which explain them far better than I ever could.

We will only go through how to parse the .wav file and get to the raw audio data using C#.

Wav File Structure

You can find everywhere the basic .wav file structure. Unfortunately, life is not that simple, and that structure can be pretty complex in real life. I will not go into detail, but you can check here .

We will also use the above-mentioned site to validate our logic. Here is the diagram with the structure.

Wav_File_structure

How To Read the Audio File

Before we start the parsing, we must read the file. Fortunately, it is really straightforward:


using (var file = File.Open("FULL_PATH_TO_YOUR_WAV_FILE", FileMode.Open))
{
    BinaryReader reader = new BinaryReader(file);
}

BinaryReader is a C# class that handles primitive data and works with Stream objects. There are some very useful methods inside which you will see shortly.

The RIFF Chunk

Based on the above diagram, the first 4 bytes contains the word RIFF in ASCII form. We can read the bytes and check whether this is true or not and display the word RIFF to the console.

string riffChunkId = Encoding.ASCII.GetString(reader.ReadBytes(4));
Console.WriteLine(riffChunkId);

After making sure this is correct, we can continue with ChunkSize and Format.

int riffChunkSize = reader.ReadInt32();
Console.WriteLine(riffChunkSize);

string waveFormat = Encoding.ASCII.GetString(reader.ReadBytes(4));
Console.WriteLine(waveFormat); // WAVE

The Unexpected Chunk

There are many chunks that can reside after RIFF. Each one of them, expected or not, have the chunk size we can use to parse or discard the data.

In my particular WAV I had JUNK chunk that I needed to discard in order to get to the data.

 string unknownSection = Encoding.ASCII.GetString(reader.ReadBytes(4)); // JUNK

 int junkChunkSize = reader.ReadInt32(); // 28

 var disposedJunkChunk = reader.ReadBytes(28);

As you see, we can dispose the unneeded chunk and proceed safely to the next section.

The fmt Chunk

The next bit of info is the fmt chunk. Just like before, we can read it and check if it is correct:

fmt = Encoding.ASCII.GetString(reader.ReadBytes(4)); // fmt

The fmt chunk comes with bunch of data attached and all of them are described very well. We should parse each and every one of them to get to the data:

int fmtChunkSize = reader.ReadInt32();

// PCM = 1, If it is other than 1 it means that the file is compressed.
short audioFormat = reader.ReadInt16();

// Mono = 1; Stereo = 2
short numberOfChannels = reader.ReadInt16();

// 8000, 44100, etc.
int samplesPerSecond = reader.ReadInt32();

// Byte Rate = SampleRate * NumChannels * BitsPerSample/8
int avgBytePerSecond = reader.ReadInt32();

// Block Align = NumChannels * BitsPerSample / 8
short blockAlign = reader.ReadInt16();

// 8 bits = 8, 16 bits = 16, etc.
short bitsPerSample = reader.ReadInt16();

We can take 2 paths from here, according to the fmt chunk size. If we have 16 bits, it means that we have some additional data to process:

if (fmtChunkSize == 18)
{
    // size of the extension: 0
    short extensionSize = reader.ReadInt16();
}
else if (fmtChunkSize == 40)
{
    // size of the extension: 22
    short extensionSize = reader.ReadInt16();

    // Should be lower or equal to bitsPerSample
    short validBitsPerSample = reader.ReadInt16();

    // Speaker position mask
    int channelMask = reader.ReadInt32();

    // GUID(first two bytes are the data format code)
    byte[] subFormat = reader.ReadBytes(16);
}
else if (fmtChunkSize != 16)
{
    throw new Exception("Invalid .wav format!");
}

Unexpected fact Section

After the fmt and all the data you parsed hopefully you will get to the raw data, but you may need to parse another section that is not required and, in my case, not needed. Say hello to the fact section.

nextSection = Encoding.ASCII.GetString(reader.ReadBytes(4));
if (nextSection == "fact")
{
    // Chunk size: 4
    int factChunkSize = reader.ReadInt32();

    // length of the sample
    int sampleLength = reader.ReadInt32();

    // Check what is the next section
    nextSection = Encoding.ASCII.GetString(reader.ReadBytes(4));
}
else if (nextSection != "data")
{
    throw new NotImplementedException();
}

Expected data Section

After all this parsing and checking and comparing we arrive to the raw audio file chunk. Without further a due here is how we can get it:

// Contains the letters "data"
string dataChunkId = nextSection;

// This is the number of bytes in the data.
int dataChunkSize = reader.ReadInt32();

// This is the raw audio data
byte[] data = reader.ReadBytes(dataChunkSize);

First we must make sure that we are indeed in the correct section. Then we can read the size and the data itself.

Complete code

You can find the complete source code of the above examples in my gists. Feel free to comment and give a star.

Become a Subscriber

Subscribe to my blog and get the latest posts straight to your inbox.