Coding into the Void

Coding into the Void

A blog that I’ll probably forget about after making thirty-six posts.

Yet Another CHIP-8 Emulator

I’d wanted to try my hand at emulation for a while, but never got around to it. Making a GameBoy emulator is the project that I always see online, but from what I’ve heard that gets tedious and frustrating once you get past the documented part, so you end up with something half-working.

Then, the other day, I heard that CHIP-8 is the perfect starting emulator. The CHIP-8 is a virtual machine developed by Joseph Weisbecker. Simple instruction set, well-documented,1 and gives you access to a bunch of games. I figured I’d start it out, not sure if I’d get to a final working product. Four hours later, I had a fully functioning, mostly bug-free, emulator displaying and taking inputs in Unity. This is a very simple system to emulate, and I had a great time doing it.

You won’t get any iconic versions of games, since it’s essentially a homebrew system, but there are tons of homebrew games for you to try out. If you have any interest in emulators, I recommend trying this one out yourself.

The main resource you need for getting started is Cowgod’s technical reference here. Although Matt Mikolay’s reference here is technically more accurate, the layout of the former is more conducive to implementation, in my opinion. I believe I’ve documented all the discrepancies between the two documents in the Pitfalls section, so if you reference that (specifically the deviations from spec and contradictions sections), you should be set.

The emulator I treated as the source of truth for comparison was John Earnest’s emulator Octo, which has the most thought put into it of any CHIP-8 emulator I’ve seen, and has had multiple people making games for it, although many are for the SuperChip and XO-Chip extensions.

Pitfalls

The lion’s share of the information I got about CHIP-8 emulation was from Cowgod’s technical reference. If you see people talking about writing CHIP-8 emulators online, they’re probably referring to it as the “definitive”2 source for the instruction set. It’s the clearest to read and very straightforward, but it is lacking, contradictory, or incorrect on various details.3 It’s possible that some areas I found lacking were due to a haphazard reading of the document. It wouldn’t be the first time.

For reference, if I’m quoting directly from a document, the document referenced is Cowgod’s, unless otherwise stated. Tips are ordered by how much I think it could spoil your experience of figuring things out for yourself.

Deviations from Spec

These are deviations from the initial CHIP-8 spec, but these deviations are so common that they’re probably the de facto standard for writing CHIP-8 emulators.

I took the route that the Octo developer did, since he has thought about CHIP-8 much more than I have, and defaulted to the spec while supporting the deviations through a quirks mode flag. Keep in mind that this will cause some ROMs to fail unless you enable quirks mode.4

8xy6

8xy6 - SHR Vx {, Vy} Set Vx = Vx SHR 1.

If the least-significant bit of Vx is 1, then VF is set to 1, otherwise 0. Then Vx is divided by 2.

Should be Vx = Vy SHR 1, and the least-significant bit should be read from Vy. It’s clear that this was intended by the opcode format including the y register. Otherwise, this should be 8x06.

8xyE

8xyE - SHL Vx {, Vy}
Set Vx = Vx SHL 1.

If the most-significant bit of Vx is 1, then VF is set to 1, otherwise to 0. Then Vx is multiplied by 2.

Should be Vx = Vy SHL 1, and the most-significant bit should be read from Vy. It’s clear that this was intended by the opcode format including the y register. Otherwise, this should be 8x0E.

Fx55

Fx55 - LD [i], Vx
Store registers V0 through Vx in memory starting at location i.

The interpreter copies the values of registers V0 through Vx into memory, starting at the address in i.

This should also increment i by x + 1, presumably as the i register was being incremented as it was being read in the original design.

Fx65

Fx65 - LD Vx, [i]
Read registers V0 through Vx from memory starting at location i.

The interpreter reads values from memory starting at location i into registers V0 through Vx.

This should also increment i by x + 1, presumably as the i register was being incremented as it was being read in the original design.

Documentation is Contradictory

8xy5

8xy5 - SUB Vx, Vy Set Vx = Vx - Vy, set VF = NOT borrow.

If Vx > Vy, then VF is set to 1, otherwise 0. Then Vy is subtracted from Vx, and the results stored in Vx.

The first explanation is at odds with the second. Borrow happens when the subtraction would go below zero, not when it is zero or below.

The first statement is correct: Vf = NOT borrow.

If Vf is set to 1 when Vx > Vy, then Vf will erroneously have 0 when Vx == Vy. This should instead be Vx >= Vy.

8xy7

8xy7 - SUBN Vx, Vy
Set Vx = Vy - Vx, set VF = NOT borrow.

If Vy > Vx, then VF is set to 1, otherwise 0. Then Vx is subtracted from Vy, and the results stored in Vx.

Same as the last. It should instead be Vy >= Vx.

Documentation is Wrong

Call and Return

For calls, the reference suggests that you should increment the stack pointer, then put the program counter on top of the stack. For returns, it suggests that you should read the address at the top of the stack, then subtract one from the stack pointer.

Following this advice will cause the bottom of the stack to be unused. Instead, for calls, you should put the program counter on top of the stack, then increment the stack pointer. For returns, you should subtract one from the stack pointer, then read the address on top of the stack.

There may be a reason that reference keeps the bottom of the stack at 0, but it doesn’t seem to match any other source online, and I don’t see much use to being able to pop into interpreter data.

Wait for Keypress FX0A Requires a New Key RELEASE

Another subtlety in the document. While Ex9E and ExA1 don’t care when a key was pressed, the FX0A instruction requires that a key go from a pressed state to a not-pressed state for execution to continue. That means that if an FX0A instruction is seen, no other instructions will be read that cycle, as it requires a key to be released after than instruction is processed.

If it seems like I’m stressing the release part of it, it’s because I first “fixed” the instruction to check on a key press, which helped, but introduced another subtle error.

Test ROMs are a Mixed Bag

I came across two test ROMs when trying to verify the correctness of my emulator.

chip8-test-rom

The chip8-test-rom is useful, as it’ll tell you if various instructions are incorrect. I think it caught one or two of my problematic ones. It’s useful, but the 8xy6 check will fail if you’re not in quirks mode.

c8-test

The c8-test ROM also diagnoses some issues, see here for what the numbers it prints mean.

With quirks mode off, it claims to fail on bnnn (showing a 14). However, it’s actually failing on the shift left (8a0e) command, which is one of the expectedly quirky instructions. It should print out OK if you run it with quirks mode on.

Where to Store the Font

0x000 - 0x1FF are interpreter space. The font goes somewhere in there, and it’s not clear where. Theoretically this shouldn’t matter since programs shouldn’t poke around in here, but what if some program wants to have some fun by messing with the font? Is there some unofficial standard?

I’ve seen some reference 0x050 as the starting address, like here, but the Octo emulator appears to start it at 0x000. It doesn’t appear that there’s any consensus, so do what thou wilt.

I put it at 0x000, but I wish I’d stored it at 0x050 so that I could use 0x000-0x04f for temporary variables when performing computations for a bit of extra authenticity.5 Maybe even store the input and timers in there. Go wild!

Drawing & Carries

It’s tempting to begin draw calls by setting the value of vF to 0. However, if 0xF is the register that x or y reference, it will clear the register before you can read it. You’ll either need to cache either the value of vF, or the value to be written to vF.

Similarly, when doing mathematical operations, either vX or vY could point to vF. Make sure vF isn’t being overridden before you’ve read from them.

Delay and Sound Timers

Originally, I was decrementing the delay and sound timers once every cycle. The documentation suggests that these timers are cycle-independent, and update at 60 Hz. That appears to be the consensus (and allows for cycle-agnostic timing), so I went with that.

Does Halting for Input Halt the Timers?

Fx0A - LD Vx, K
Wait for a key press, store the value of the key in Vx.

All execution stops until a key is pressed, then the value of that key is stored in Vx.

As stated above, the timers are updated outside of the opcode run execution, so update those timers. It’s vague, but execution suggests that you only stop processing new instructions. Mikolay’s reference references program execution, which corroborates that. Besides, who wants the sound to be stuck on while waiting for an input?

When to Update the Program Counter

Outside of talking about some skips and jumps, the reference doesn’t talk about what to do with the program counter whatsoever. This is probably obvious to anyone who has worked on emulators before, but after reading an opcode you’ll want to increase the program counter by 2.6

If you value simplicity, increment the program counter before you process the opcode. If you increment the program counter, it will require you to special case the jump (1nnn and Bnnn), call (2nnn), and return (00ee), as those expect to go directly to the address specified.

You will still need to handle the input (Fx0A) case when there is no new key pressed. I set a flag that we’re waiting for input when Fx0A is hit. I used to decrement the program counter by two, but I had to abandon that when I ran into other issues.

Input Order

Annoyingly, the technical reference only lists the keys that CHIP-8 listens on in the original machine, but does not indicate the order in which these keys are mapped into for reading input. It’s not, as one might expect, left-to-right and top-to-bottom. Nor is it top-to-bottom and left-to-right.

It may be obvious to everyone except for me, but the index order is determined by the hexadecimal value of the key on the original layout.7

This means that when passing in the key press values for the modern layout, you should use this index order:
{ X, 1, 2, 3, Q, W, E, A, S, D, Z, C, 4, R, F, V }

The following lays out the original inputs next to the de-facto modern standard.

Original            Standard         
+---+---+---+---+   +---+---+---+---+
| 1 | 2 | 3 | C |   | 1 | 2 | 3 | 4 |
+---+---+---+---+   +---+---+---+---+
| 4 | 5 | 6 | D |   | Q | W | E | R |
+---+---+---+---+   +---+---+---+---+
| 7 | 8 | 9 | E |   | A | S | D | F |
+---+---+---+---+   +---+---+---+---+
| A | 0 | B | F |   | Z | X | C | V |
+---+---+---+---+   +---+---+---+---+

Display Coordinate System

This one is very clear in the spec, but I still managed to mess it up. Most graphical coordinate systems have (0, 0) at the bottom left, whereas CHIP-8 has (0, 0) at the top left. Pretty simple fix, just invert the y value.

Make Sure You Wrap the Sprites

The documentation is very clear on this, but it’s tempting to let the pixels you’re writing to wrap around to the next row by not guarding them against width or height overruns. In the best case, this will just cause your sprites to appear on the next row. In the worst case, this will cause you to read outside of memory.

My Hardest Bug to Find

A bug that took over an hour to fix was in one of the simplest instructions: branch. The c8-test ROM, linked above, was giving me the error 03. According to its documentation, that indicates an error with my add command (7xnn), specifically the carry portion of it.

I stepped through my code until I found the failing instruction, and noticed that the reason it was failing was that vF has a pre-existing value of 1. Octo was passing the test, so I knew it must be something with my emulator.

No big deal, I thought, I’ll just see what prior command is erroneously setting vF to 1. I put a break point on all the writes to vF. None of it was incorrect.

Mystified, I decompiled the program in Octo and looked for the step in which I was entering the “forbidden zone”.8 And I wasn’t. Mysteriously, I was entering into unexpected areas after the branch command (bnnn).

That didn’t make sense though. My branch command is trivial. It couldn’t have a bug. This is what it looked like:

_programCounter = (byte)(nnn + _registers[0]);

It’s simple. Just set the program counter to the value nnn, adding the contents of v0.

I stepped through, expecting the program counter to have the value 0x300, as nnn is 0x2fc and v0 is 0x004. I stepped over it, and the program counter had the value of zero. I looked at the code, and it hit me.

C# makes you explicitly cast when you’re shrinking a numeric type, like from int to short. This makes sense, since it’s a lossy conversion. They will happily, however, implicitly convert from a smaller type to a larger type. When the compiler had complained about me assigning a value to _programCounter, I had done what I had for most of the entries, just cast it to byte.

However, unlike 99% of the other values, _programCounter is of type ushort. Performing 0x300 & 0x0FF, what will happen when you convert a uint to a byte, results in zero, so I was jumping to the start of memory: my font data code.

From there it happily chewed through all the unrecognized instructions until it got back to the program code, leaving only vF set to 1 as a memory of its journey.

The fix, of course, was to change my line of code to:

_programCounter = (ushort)(nnn + _registers[0]);

I also should throw some sort of warning when an unrecognized instruction comes in, but I like to live dangerously.

Conclusions

Writing an emulator was plenty of fun, and it was rewarding to see already-existing programs load up and run successfully in my emulator. The CHIP-8 spec inspired many derivations, so I may look into seeing if the SuperChip is easy to emulate as well.


  1. This isn’t entirely true, I found out. ↩︎

  2. Like I mentioned above, this is now considered the definitive reference. ↩︎

  3. Since it is based on another document that many CHIP-8 emulators pull from, many of these incorrect details are now the most common implementations. ↩︎

  4. Of course, putting it in quirks mode will cause other ROMs to fail. Most ROMs don’t document what they expect. Why would they, after all? Everyone thought they were using a standard emulator. ↩︎

  5. You could also do that starting at 0x050, but that feels less clean somehow. ↩︎

  6. This is because opcodes are 2 bytes, and the memory is an array of bytes. ↩︎

  7. I didn’t actually figure this out until I was drawing the tables for the layouts here and realized that the input order grid was the same as the original layout grid. Better late than never, I suppose. ↩︎

  8. Parts of the program that the Octo decompiler marks as unused because it’s a trap for bad instructions. ↩︎