Yet Another Chip-8 Emulator: XO-Chip (Sound at Last Edition)

It’s been three years since I abandoned adding XO-Chip sound support to my emulator. Some call it my biggest failure.1

Today, that phase2 of my life has ended. Today, I tackled F002 and Fx3A. This is, unfortunately, going to be the most tied to the underlying resources I used (Unity and C#). Whereas most places I could just yadda yadda yadda over the basic platform differences (like grabbing input), this is kind of the whole thing.

But first, let’s get into why I failed previously. I’d experimented with a few approaches: creating AudioClips on the fly with PCMReaderCallback and overriding OnAudioFilterRead, but ultimately the combination of not really understanding how PCM works and not understanding how Unity’s audio worked got to me. Coupled with variable speeds (Unity running at 60 (or so) fps, CHIP-8 running at thousands (or so) ticks per frame, audio running at 44.1k (or so) samples per second), it was a lot to wrap my head around.

So, this time, I figured I should start at the basics.

What’s the Deal with PCM?

PCM stands for pulse-code modulation. This sort of sound is very close to the metal—it’s just a store of the amplitude of the wave of a signal at the current time. In Unity, this is represented by an array of floats which each individually range from [-1, 1].

If you have multiple channels (different waves that you want to represent), they are interleaved with each other. So, for instance, if you had two channels, the first entry in the array would be the first entry for the first channel, the second would be the first entry for the second channel, the third would be the second entry for the first channel, and so on.

How it Worked Before

My CHIP-8 implementation for audio was as simple an implementation of the OnAudioFilterRead interface as you can get. Whenever the method was called, it would generate a wave and shove it into the buffer. What I shoved into the buffer was always the same, and SetSound (which took a duration) was used to turn it on or off.

public void SetSound(int sound) {
    _soundOn = sound != 0;
    if (_soundOn && !_audioSource.isPlaying) {
        _audioSource.Play();
    } else if (!_soundOn && _audioSource.isPlaying) {
        _audioSource.Stop();
    }
}

void OnAudioFilterRead(float[] data, int channels) {
    float fqClamp = Mathf.Clamp(Frequency, 100, 8000);
    for (int i = 0; i < data.Length; i += channels) {
        float t = _timeIndex * fqClamp / SAMPLE_RATE;
        float value = GenerateWave(t);
        for (int c = 0; c < channels; c++) {
            data[i + c] = value;
        }

        //if timeIndex gets too big, reset it to 0
        _timeIndex = (_timeIndex + 1) % (int)(SAMPLE_RATE * WAVE_LENGTH_SEC);
    }
}

private float GenerateWave(float t) {
    switch (Type) {
        case WaveType.Sine:
            return Mathf.Sin(2 * Mathf.PI * t);
        case WaveType.Square:
            return t < 0.5f ? 1f : -1f;
        case WaveType.Triangle:
            return Mathf.PingPong(t * 2f, 1f) * 2f - 1f;
        default:
            return 0f;
    }
}

Notably, I was using OnAudioFilterRead already, but in a very simple way. I did not understand when it was called, nor did I understand its concept of time.

The XO-Chip Complications

It is the XO-Chip additions that require some complexity to my very simple code.

Audio F002

This instruction takes 16 bytes from memory (pointed to by the register i) and loads them into an audio pattern buffer. These 16 bytes represent 128 bits of audio data, and is our PCM sample data. Looping through these bits gets us our custom sound.

Pitch Fx3A

This sets the playback rate of the audio pattern. The formula provided in the spec is 4000*2^((Vx-64)/48). This tells us how quickly to iterate through the sample table.

Increasing this makes the sample play at a higher pitch, but also faster. Similarly, slowing it down will make it play at a lower pitch, but also slower. This is standard audio stuff that is inherent to how waves work.

Actually Understanding OnAudioFilterRead()

OnAudioFilterRead() is a Unity-defined callback that will be invoked on any C# script attached to a GameObject that also has an AudioSource.

This description will mostly be paraphrased from the documentation, but I find it useful to transcribe these things, so I’ll do it. I’d recommend reading that first.

If this method is implemented, Unity will add a custom filter into the audio DSP chain. DSP stands for Digital Signal Processor, and it basically acts as a transformer on digital samples (in this case, our PCM samples).

Normal use of this API is to modify samples on the audio clip attached to the AudioSource. You get data in through the API (which is actually OnAudioFilterRead(float[] data, int channels)), you can futz with it a little bit, and then write it back out.3

However, crucially for this use case, if there is no audio clip attached to the data source, this callback effectively acts as the audio clip, letting us synthesize the sound ourselves.

When is OnAudioFilterRead() called?

This was one of the aspects that confused me in the original implementation. The documentation says that this method is called every 20ms or so (it’s platform dependent, and is actually slightly configurable using Project Settings -> Audio -> BSP Buffer Size).

My concern was this: what happens if Unity requests audio the moment before my emulator pushes data through a F002 command. Wouldn’t that introduce some latency to our sound? Can I not force a new call to this on demand?

Yes, it does introduce some latency. No, I cannot introduce a new call to this on demand. This seemingly is an issue that most emulators have, but for my purposes, it’s an acceptable price to pay.

What should I populate it with?

In the editor on Windows, the callback gives me a data size of 2048 with a channel count of 2. That means that we will need to populate 1024 samples per channel every time the audio callback occurs.

However, the sample we get only has 128 samples of audio data. I naively thought “hey, this is an eighth of what the buffer requires, I’ll just toss it in eight times”.

This is incorrect.

We need to account for both the sample rate of the audio (AudioSettings.outputSampleRate) and the pitch set by the ROM we’re playing. To do that, we’ll need to sample the waveform (the audio buffer) to account for that frequency.

When the F002 command occurs, I take the 16-byte buffer and expand it to be 128 floats, either set to -1 or 1 based on the bit it refers to:

for (int i = 0; i < 128; i++) {
    int byteIndex = i / 8;
    int bitIndex = 7 - (i % 8);
    byte b = buffer[byteIndex];
    bool bitSet = ((b >> bitIndex) & 0x1) == 1;
    _patternBuffer[i] = bitSet ? 1.0f : -1.0f;
}

We’ll keep track of our current position in the waveform using _phase, and increment it from a value derived from XO-Chip’s pitch and the sample rate. Note that if we don’t have a pattern buffer, I treat playing sound as the standard CHIP-8 beep. That’s part of my code, but I excluded it here for brevity:

flaot increment = increment = _xoPitchHz / _sampleRate;

for (int i = 0; i < data.Length; i += channels) {
    float sampleValue = 0f;

    // Index into the waveform based on the phase.
    int index = (int)_phase;
    if (index >= _patternBuffer.Length) index = 0;
    sampleValue = _patternBuffer[index];

    _phase += increment;
    if (_phase >= _patternBuffer.Length) _phase -= _patternBuffer.Length;

    // Apply Volume and write to all channels
    sampleValue *= _cachedVolume;

    for (int c = 0; c < channels; c++) {
        data[i + c] = sampleValue;
    }
}

Note that _sampleRate is a value I cached from AudioSettings.outputSampleRate on startup, as you cannot read that value from the thread that calls into OnAudioFilterRead(). Similarly, _cachedVolume is read from the AudioSource on startup so that I can have control over the sound via the UI. This callback ignores the AudioSource volume by default, so you’ll have to set it yourself.

Unless the pitch set is quite high, it’s likely possible that the amount requested by OnAudioFilterRead() will be less than the PCM sample set by the ROM. If it isn’t, it will just be looped more than once. Partly for this reason, it’s crucial that the phase is persisted across calls to OnAudioFilterRead(). It also should not be reset on setting the pitch nor on the buffer, just when sound is enabled once again.

This works!

What about channels?

The channels are the different streams of audio for physical speakers. So if you just have one speaker, it’d be a single channel (mono). If you’re wearing headphones with one on each ear, it’d be two channels (stereo). 5.1 (5 speakers and one subwoofer) would be six channels.

XO-Chip (and the beep generated by the simpler systems) are all mono.

This means that we just send the same signal to each channel. Since they’re interleaved in the callback, we just set the same value X times.

for (int c = 0; c < channels; c++) {
    data[i + c] = sampleValue;
}

The WebGL Problem

OnAudioFilterRead() isn’t supported in WebGL. This means that my emulator will have no sound if I export for web.

Theoretically, I could solve this by creating audio clips on the fly with the PCMReaderCallback, but in practice I found that this introduced hundreds of milliseconds of latency. I’m not sure if I implemented it wrong, but this was not suitable for my purposes.

The Stomping Problem

My XO-Chip loop is structured as follows (simplified psuedocode):

frame() {
    processInput()
    for each cycle
        runCurrentOpCode()
    updateAudio()
    updateDisplay()
}

Most XO-Chip ROMs tend to run around a thousand cycles per frame. I’ve locked my renderer to 60 frames per second. This means that every cycle runs for about 1/60000th of a second. Every audio request is at about 20ms (1/50th of a second) with an XO-Chip sample (at the default frequency) taking about 32ms (1/30th of a second).

Someone who was being clever with their clock cycles could busy loop on XO-Chip so that they could stuff in a new audio clip mid-phase. That would allow them to output multiple different waveforms in a phase.

However, if you do that in my emulator, I will only output the last note (I’m reading the current state of the value in updateAudio()). This is incorrect behavior.

The only ROM I’ve come across that relies on this being correct is, of course, a port of Bad Apple, which has some minor audio glitches on my emulator.

John Earnest’s4 emulator appears to handle this case correctly (in shared.js), so I probably should fix it at some point.

Conclusion

In my previous implementation, I was actually pretty close to getting something working, but was just missing some basic details that would have gotten me past the finish line. Specifically, how I needed to sample into the PCM sample instead of just copying those values into the OnAudioFilterRead() buffer. This is obvious in hindsight, but was something that I struggled with.

As part of this, I’d thought that the sample would need to be copied into the OnAudioFilterRead() data buffer multiple times, which intuitively felt wrong. These instincts were right! However, I wasn’t able to make the leap to the correct behavior, as I fundamentally did not know enough about how audio worked for computers.

Programming!5


  1. You’d be stunned by how many people are saying this. ↩︎

  2. Pun most certainly intended. ↩︎

  3. You can have multiple OnAudioFilterRead attached to one AudioSource. I’m not sure what the ordering of the callbacks is. Presumably component order? It’s not relevant for our use case. ↩︎

  4. The guy that wrote the XO-Chip spec, so he should know. ↩︎

  5. I just couldn’t come up with a good conclusion here, so this is what you get. ↩︎