I Wrote a Book About Building a NES Emulator
I wrote a book about building a NES emulator from scratch in Crystal. Here's the story, some code, and a playable web version.
In my last post I mentioned I’d spent a few months writing a book about building a NES emulator from scratch. Well, it’s done. And I want to tell you about it, because I think the journey from “I wonder how emulators work” to “I wrote a 280-page book about it” is kind of ridiculous and worth sharing.
How we got here
It started, as most bad decisions do, at 2 AM. I was playing Mario Bros in a browser emulator, died in world 2-3, and instead of going to sleep like a normal person, I started wondering how the emulator worked. A few weeks later I had a working emulator in Crystal running at 60 FPS. A few months after that, I had a book.
The thing is, while building the emulator I kept thinking: “I wish someone had explained this to me step by step.” The NES Wiki is incredible but dense. YouTube tutorials assume you already know C and have opinions about memory allocation strategies. I wanted something that started from zero and built up piece by piece, with code first and theory after.
So I wrote that thing.
What’s in the book
You start with a CPU that can’t do anything. Literally nothing. Then you teach it to load a number into a register. Then to add. Then to jump. By the end you have all 151 instructions of the 6502 processor implemented, a PPU that renders pixels, an APU that generates audio, and you’re playing Super Mario Bros.
Here’s the full roadmap:
- Chapters 1-2: NES architecture overview + Crystal setup
- Chapter 3 (7 sub-chapters): The entire 6502 CPU, all 151 opcodes
- Chapter 4: Cartridge parsing, iNES format, Mapper 0
- Chapter 5 (6 sub-chapters): PPU, SDL2 GUI, background rendering, sprites, scroll
- Chapter 6: Plugging in real games and watching them run
- Chapter 7: APU, generating audio with square, triangle and noise waves
- Appendix: Mapper 1 (MMC1) for games like Zelda and Mega Man 2
The whole thing is written in Crystal, which if you know Ruby you basically already know. No C, no emulation libraries. Just you, a text editor, and 20,000 virtual transistors.
Let me show you what I mean
The best way to explain the book’s approach is to show you some actual code from it. Let’s look at how the CPU works.
The CPU has registers (tiny pieces of memory inside the chip), a program counter that tracks where we are in the code, and a step method that fetches the next opcode and executes it:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
# src/nes/cpu.cr
getter a : UInt8 # Accumulator
getter x : UInt8 # X register
getter y : UInt8 # Y register
getter sp : UInt8 # Stack Pointer
property pc : UInt16 # Program Counter
getter status : UInt8 # Flags (Zero, Negative, Carry, etc.)
def step
opcode = fetch_byte
case opcode
when CODE_LDA_IMMEDIATE then op_lda_immediate
when CODE_LDA_ZERO_PAGE then op_lda_zero_page
when CODE_LDA_ABSOLUTE then op_lda_absolute
when CODE_LDA_ABSOLUTE_X then op_lda_absolute_x
# ... STA, LDX, LDY, ADC, SBC, JMP, branches ...
when CODE_INX then op_inx
when CODE_NOP then op_nop
else raise UnknownOpcode.new(opcode)
end
CYCLES[opcode]
end
That’s the whole CPU. Fetch a byte, match it against 151 opcodes, execute the right method, return how many cycles it took. The giant case statement looks intimidating at first, but each instruction is just a few lines.
Let’s zoom into one. Every NES instruction has an opcode, a number that tells the CPU what to do. When the CPU reads 0xA9 from memory, it knows it has to run LDA (Load Accumulator) in immediate mode:
1
2
3
4
5
6
7
8
9
10
11
12
13
# src/nes/cpu/instructions/lda.cr
def lda(value)
@a = value
set_z_flag(@a)
set_n_flag(@a)
end
def op_lda_immediate
value = fetch_byte
lda(value)
end
Read a byte, put it in register A, update the flags. That’s it. The lda method is reusable across all 8 addressing modes, each one just resolves the address differently:
1
2
3
4
5
6
7
8
9
10
11
12
13
def op_lda_zero_page
address = address_zero_page
value = read_byte(address)
lda(value)
end
def op_lda_absolute
address = address_absolute
value = read_byte(address)
lda(value)
end
# ... and so on for all 8 modes
See the pattern? Once you implement one instruction family, the rest follow the same structure. The book shows you a few in detail, you implement 10-15 yourself to really internalize how the CPU works, and then you grab the rest from the repo. No one needs to hand-type 151 opcodes.
The part that blew my mind: the PPU
The CPU is satisfying to build, but the PPU is where things get wild. The NES draws an entire screen with 2KB of RAM. Two kilobytes. Your average email is bigger than that.
The PPU (Picture Processing Unit) is a separate chip that runs 3 times faster than the CPU and has its own memory. It draws the screen scanline by scanline, 256x240 pixels, 60 times per second. The book walks you through it step by step: first a black screen, then the background, then sprites, then scroll. Each chapter adds one layer and you can see the progress on screen.
When Mario’s title screen showed up for the first time, I just sat there staring at it for a good minute. And then I pressed Start and nothing happened because of a missing feature called sprite 0 hit (in the book I’ll tell you all about it). Classic.
The emulation loop
Maybe my favorite part of the whole emulator is how simple the core loop is:
1
2
3
4
5
6
def step
cycles = @cpu.step
(cycles * 3).times { @ppu.step }
@apu.step(cycles)
cycles
end
Three lines. The CPU executes one instruction and tells you how many cycles it took. The PPU runs 3 times as fast (that’s the real hardware ratio). The APU keeps up. That’s the entire emulation loop. Everything else is just implementing the details behind each .step.
Play it right now
I compiled the emulator to WebAssembly so you can play it in your browser. No downloads, no setup. Just pick a game and go:
👉 emulator.matiassalles99.codes
Fair warning: you might lose an hour. I sure did while “testing” it.
Get the book
The book is available in both English and Spanish on Leanpub:
- 🇬🇧 English: Building Your First Emulator
- 🇪🇸 Español: Construí tu Primer Emulador
Both versions include a free sample that covers the introduction, NES architecture, setup, and the first coding chapter where you build the CPU skeleton and implement your first two instructions. That’s enough to know if the book is for you.
If you know Ruby, Python, or any similar language, you can follow along. Crystal reads almost exactly like Ruby, and the book doesn’t assume any knowledge of emulation or retro hardware.
Who is this for?
Honestly? Any programmer who’s ever been curious about what happens below the abstraction layers we work on every day. You don’t need to know anything about emulation, hardware, or assembly.
But it’s especially for people like me: web developers who spend all day in Rails or React and sometimes wonder what a CPU actually does when it runs our code. If you’re curious about that side of things, I also built a 4-bit CPU on breadboards a while back.
It’s also not a reference manual. It’s informal, opinionated, and occasionally self-deprecating. I wrote it the way I’d explain things to a friend over coffee (or beer, depending on the chapter).
What’s next
I have a few more side projects lined up that I’ll be building in Crystal. The language has been a revelation for anything that needs real performance but where I don’t want to leave Ruby’s syntax behind. I’ll keep writing about them here.
If you end up building the emulator, playing the web version, or reading the book, I’d love to hear about it. Drop me a message, open an issue, whatever works.
Now if you’ll excuse me, I need to go beat world 2-3.
Thanks for reading! 😄



