I Wrote a Book About Building a NES Emulator

I wrote a book about building a NES emulator from scratch in Crystal. Here's the story, some code, and a playable web version.

Posted Mar 22, 2026

Building Your First Emulator

By Matias Salles

7 min read

In my last post I mentioned I’d spent a few months writing a book about building a NES emulator from scratch. Well, it’s done. And I want to tell you about it, because I think the journey from “I wonder how emulators work” to “I wrote a 280-page book about it” is kind of ridiculous and worth sharing.

How we got here

It started, as most bad decisions do, at 2 AM. I was playing Mario Bros in a browser emulator, died in world 2-3, and instead of going to sleep like a normal person, I started wondering how the emulator worked. A few weeks later I had a working emulator in Crystal running at 60 FPS. A few months after that, I had a book.

The thing is, while building the emulator I kept thinking: “I wish someone had explained this to me step by step.” The NES Wiki is incredible but dense. YouTube tutorials assume you already know C and have opinions about memory allocation strategies. I wanted something that started from zero and built up piece by piece, with code first and theory after.

So I wrote that thing.

What’s in the book

You start with a CPU that can’t do anything. Literally nothing. Then you teach it to load a number into a register. Then to add. Then to jump. By the end you have all 151 instructions of the 6502 processor implemented, a PPU that renders pixels, an APU that generates audio, and you’re playing Super Mario Bros.

Here’s the full roadmap:

Chapters 1-2: NES architecture overview + Crystal setup
Chapter 3 (7 sub-chapters): The entire 6502 CPU, all 151 opcodes
Chapter 4: Cartridge parsing, iNES format, Mapper 0
Chapter 5 (6 sub-chapters): PPU, SDL2 GUI, background rendering, sprites, scroll
Chapter 6: Plugging in real games and watching them run
Chapter 7: APU, generating audio with square, triangle and noise waves
Appendix: Mapper 1 (MMC1) for games like Zelda and Mega Man 2

The whole thing is written in Crystal, which if you know Ruby you basically already know. No C, no emulation libraries. Just you, a text editor, and 20,000 virtual transistors.

Let me show you what I mean

The best way to explain the book’s approach is to show you some actual code from it. Let’s look at how the CPU works.

The CPU has registers (tiny pieces of memory inside the chip), a program counter that tracks where we are in the code, and a step method that fetches the next opcode and executes it:

  
# src/nes/cpu.cr

getter a      : UInt8   # Accumulator
getter x      : UInt8   # X register
getter y      : UInt8   # Y register
getter sp     : UInt8   # Stack Pointer
property pc   : UInt16  # Program Counter
getter status : UInt8   # Flags (Zero, Negative, Carry, etc.)

def step
  opcode = fetch_byte

  case opcode
  when CODE_LDA_IMMEDIATE   then op_lda_immediate
  when CODE_LDA_ZERO_PAGE   then op_lda_zero_page
  when CODE_LDA_ABSOLUTE    then op_lda_absolute
  when CODE_LDA_ABSOLUTE_X  then op_lda_absolute_x
  # ... STA, LDX, LDY, ADC, SBC, JMP, branches ...
  when CODE_INX             then op_inx
  when CODE_NOP             then op_nop
  else raise UnknownOpcode.new(opcode)
  end

  CYCLES[opcode]
end

That’s the whole CPU. Fetch a byte, match it against 151 opcodes, execute the right method, return how many cycles it took. The giant case statement looks intimidating at first, but each instruction is just a few lines.

Let’s zoom into one. Every NES instruction has an opcode, a number that tells the CPU what to do. When the CPU reads 0xA9 from memory, it knows it has to run LDA (Load Accumulator) in immediate mode:

  
# src/nes/cpu/instructions/lda.cr

def lda(value)
  @a = value

  set_z_flag(@a)
  set_n_flag(@a)
end

def op_lda_immediate
  value = fetch_byte
  lda(value)
end

Read a byte, put it in register A, update the flags. That’s it. The lda method is reusable across all 8 addressing modes, each one just resolves the address differently:

  
def op_lda_zero_page
  address = address_zero_page
  value = read_byte(address)
  lda(value)
end

def op_lda_absolute
  address = address_absolute
  value = read_byte(address)
  lda(value)
end

# ... and so on for all 8 modes

See the pattern? Once you implement one instruction family, the rest follow the same structure. The book shows you a few in detail, you implement 10-15 yourself to really internalize how the CPU works, and then you grab the rest from the repo. No one needs to hand-type 151 opcodes.

The part that blew my mind: the PPU

The CPU is satisfying to build, but the PPU is where things get wild. The NES draws an entire screen with 2KB of RAM. Two kilobytes. Your average email is bigger than that.

The PPU (Picture Processing Unit) is a separate chip that runs 3 times faster than the CPU and has its own memory. It draws the screen scanline by scanline, 256x240 pixels, 60 times per second. The book walks you through it step by step: first a black screen, then the background, then sprites, then scroll. Each chapter adds one layer and you can see the progress on screen.

When Mario’s title screen showed up for the first time, I just sat there staring at it for a good minute. And then I pressed Start and nothing happened because of a missing feature called sprite 0 hit (in the book I’ll tell you all about it). Classic.

The emulation loop

Maybe my favorite part of the whole emulator is how simple the core loop is:

  
def step
  cycles = @cpu.step
  (cycles * 3).times { @ppu.step }
  @apu.step(cycles)
  cycles
end

Three lines. The CPU executes one instruction and tells you how many cycles it took. The PPU runs 3 times as fast (that’s the real hardware ratio). The APU keeps up. That’s the entire emulation loop. Everything else is just implementing the details behind each .step.

Play it right now

I compiled the emulator to WebAssembly so you can play it in your browser. No downloads, no setup. Just pick a game and go:

👉 emulator.matiassalles99.codes

Fair warning: you might lose an hour. I sure did while “testing” it.

Get the book

The book is available in both English and Spanish on Leanpub:

🇬🇧 English: Building Your First Emulator
🇪🇸 Español: Construí tu Primer Emulador

Both versions include a free sample that covers the introduction, NES architecture, setup, and the first coding chapter where you build the CPU skeleton and implement your first two instructions. That’s enough to know if the book is for you.

If you know Ruby, Python, or any similar language, you can follow along. Crystal reads almost exactly like Ruby, and the book doesn’t assume any knowledge of emulation or retro hardware.

Who is this for?

Honestly? Any programmer who’s ever been curious about what happens below the abstraction layers we work on every day. You don’t need to know anything about emulation, hardware, or assembly.

But it’s especially for people like me: web developers who spend all day in Rails or React and sometimes wonder what a CPU actually does when it runs our code. If you’re curious about that side of things, I also built a 4-bit CPU on breadboards a while back.

It’s also not a reference manual. It’s informal, opinionated, and occasionally self-deprecating. I wrote it the way I’d explain things to a friend over coffee (or beer, depending on the chapter).

What’s next

I have a few more side projects lined up that I’ll be building in Crystal. The language has been a revelation for anything that needs real performance but where I don’t want to leave Ruby’s syntax behind. I’ll keep writing about them here.

If you end up building the emulator, playing the web version, or reading the book, I’d love to hear about it. Drop me a message, open an issue, whatever works.

Now if you’ll excuse me, I need to go beat world 2-3.

Thanks for reading! 😄

Coding, Emulation

This post is licensed under CC BY 4.0 by the author.