Are the latest LLMs smart enough to reverse engineer old software purely by static analysis, without access to a 3rd party debugger? This repository contains the results of a few tests, attempting to answer that question. All code and documents contained within (except this introduction) were written by Claude Fable 5 or Opus 4.8. In some cases my Fable session got flagged, probably due to the nature of reverse engineering, and downgraded to Opus. Opus still did a fine job though.
Even though I explicitly restricted internet searches (released source code or other reverse engineering projects) Claude had detailed knowledge if the games I've had it analyze so far. When I asked how it knew exact function names for Elite it admitted that the original source code of the game, which has been publically released years ago, was part of its training data. The same is true for character and level names of Fort Apocalypse, for example.
Some of the prompts I used (for reference):
Write a GoLang application that extracts program files from a C64 tape image in TAP format. The image in question might use a non-standard fastloader, that you need to reverse engineer by disassembling the loader. Use only static analysis of the image file without looking up external sources or external tools like Vice. Ignore memories from previous reverse engineering sessions. Document your findings about the tape image format in general and the fastloader in particular in a markdown file. Use example byte sequences in that documentation where appropriate. The tape image file in question is <GAME.TAP> in the current folder.
Now that the program has been extracted from the tape, I want to enhance the scope of the project. Using the disassembler you already wrote, analyze the extracted program's startup and document your findings in a markdown file, again using assembly snippets or byte sequences as example where appropriate.
Next, analyze the actual game code in the same way, again producing a markdown file that describes the initialization of the game until it reaches the main game loop. Put extra focus on the graphical elements like character sets, sprite data and level map. Describe compression or encryption schemes you find and build a memory map.
Analyze the game code and describe the different types of objects (player, enemies, obstacles) in the game and how they behave in a markdown file. Be as detailed as possible about each object's behavior, movement patterns and collision behaviors.
To do:
- Elite (C64)
- Part IV
- List all strings, including missiong briefings
- Try to reproduce exact system naming of initial galaxy (Diso, Leesti, Lave, etc.)
- Part IV
- Fort Apocalypse (C64)
- Run new disassembler and add known annotations
- Part IV
- Visualize radar map and objects (tank, SPM, etc.)
- Marble Madness (Amiga)
- Kick off project by writing ADF extractor, disassembler and emulator for 86k code
- Tools
- Disassembler should be better at segmenting functions; currently jumps within a function are treated as separate sub-routines; try to document parameters of sub-routines (which registers are used?)
Each game lives in its own <Game> (<Platform>)/ folder following a common
contract, and the reusable building blocks live in a shared tools module at
the root. Platform-neutral tools sit directly under tools/; platform-specific
ones live in a per-platform subfolder (tools/c64/ today; tools/amiga/,
tools/snes/, … as other games are added). A go.work ties the modules
together.
AIReverseEngineering/
├── go.work # Go workspace over tools + each game's extract/
├── README.md # this file
├── tools/ # shared tooling (module stupidcoder.com/tools)
│ ├── mos6502/ # 6502 disassembler + executable CPU core (any 6502 platform)
│ ├── m68k/ # Motorola 68000 disassembler + CPU core (any 68k platform)
│ ├── cmd/dis6502/ # linear disassembler for a .prg file (6502)
│ ├── cmd/dis68k/ # linear disassembler for a raw 68000 blob
│ ├── cmd/codetrace6502/ # recursive-descent disassembler (6502)
│ ├── cmd/codetrace68k/ # recursive-descent disassembler (68000)
│ ├── c64/ # C64-specific tools
│ │ ├── tap/ # TAP container parser + segmentation
│ │ ├── cbmtape/ # standard KERNAL ROM tape-format decoder
│ │ ├── c64/ # machine model for running hostile loaders
│ │ ├── gfx/ # palette, char/sprite/bitmap rendering, lines, PNG/APNG
│ │ └── cmd/tapdump/ # pulse histogram + segment listing for a .tap
│ └── amiga/ # Amiga-specific tools
│ ├── adf/ # AmigaDOS floppy image (ADF) reader — OFS/FFS
│ ├── hunk/ # AmigaDOS hunk loader — relocate to a flat image
│ ├── iff/ # IFF ILBM bitmap decoder
│ ├── icon/ # Workbench .info icon decoder
│ ├── cmd/adfdump/ # list and extract files from an .adf
│ ├── cmd/amigapng/ # render an IFF ILBM or .info icon to PNG
│ └── cmd/hunkload/ # segment map + flat relocated image of a hunk file
│
├── Elite (C64)/
│ ├── Elite.tap # raw tape image
│ ├── Elite.md # tape + loader + startup writeup (more to follow)
│ ├── extract/ # module elite/extract — extraction + graphics tools
│ ├── extracted/ # generated .prg files (regenerable; git-ignored)
│ ├── rendered/ # generated PNGs (loading screen, ship wireframes)
│ └── disasm/ # annotated recursive-descent disassembly + annotations.txt
│
├── Fort Apocalypse (C64)/
│ ├── Fort_Apocalypse.tap # raw tape image
│ ├── Fort_Apocalypse.md # full game + tape writeup
│ ├── extract/ # module fortapoc/extract — extraction + gfx tools
│ ├── extracted/ # generated .prg files (regenerable; git-ignored)
│ └── rendered/ # generated PNGs (charsets, maps, sprites)
│
└── Marble Madness (Amiga)/
├── Marble_Madness.adf # raw disk image (not committed; see Image files)
└── Marble_Madness.md # disk-format writeup (Part I done; rest stubbed)
Per-game folder contract: <Game>.<ext> raw image, a markdown writeup of the
format and/or game internals, an extract/ Go module that produces the
program files, and extracted/ / rendered/ output directories. The
extraction tool is always named extract (not tied to tape — a future game
could be disk- or cartridge-based, while the per-game tool keeps the same
name).
Many differing dumps of these games circulate online. All results in this repository were produced from these exact image files; the documentation and the golden extraction tests assume them byte for byte. The MD5 (and size) below pin the precise copy, so the work stays reproducible.
| Image | Size (bytes) | MD5 |
|---|---|---|
Elite (C64)/Elite.tap |
801,592 | d51b7f84fd1bec6eb24f4bf210c8cc74 |
Fort Apocalypse (C64)/Fort_Apocalypse.tap |
225,817 | bec7409816865f3ad160af9984f127cd |
Marble Madness (Amiga)/Marble_Madness.adf |
901,120 | 735dc697d64b3eeaa000778eb0b1153a |
Verify a copy before reusing it, e.g. md5 "Elite (C64)/Elite.tap"
(md5sum on Linux).
Platform-neutral packages sit at the top level; platform-specific ones live in a
per-platform subfolder (c64/, amiga/, …).
| Package / command | What it does |
|---|---|
mos6502 |
One opcode table driving both a Disassemble function and an executable CPU core (all documented opcodes, binary + BCD) — usable by any 6502 platform. |
m68k |
Motorola 68000 toolkit mirroring mos6502: a Decode/Disassemble disassembler (full documented instruction set, all addressing modes) and an instruction-level CPU execution core over the same Bus/Step() interface. The core currently runs a minimal-but-correct opcode subset (MOVE/ALU/shift/branch/jump/DBcc/MOVEM/LINK with proper X/N/Z/V/C flags) and halts on anything not yet implemented, so gaps are explicit. Usable by any 68k platform (Amiga, ST, Genesis, …). |
cmd/dis6502 |
Linear disassembler for a .prg file (2-byte load address + data), optionally over an address range. |
cmd/dis68k |
Linear disassembler for a raw 68000 code blob loaded at a given base address (-skip steps past an AmigaDOS hunk header). |
cmd/codetrace6502 |
Recursive-descent 6502 disassembler: from given entry points (and seeded jump tables) it follows every branch/jump/call, marks reachable code vs data, lists routines and unresolved indirect jumps — so tables and graphics aren't mis-decoded as instructions. |
cmd/codetrace68k |
The 68000 counterpart of codetrace6502, built on m68k: same recursive trace over a raw blob loaded at -base (-skip past a hunk header), reporting routines and unresolved register/indexed jumps. |
c64/tap |
Parse a TAP v0/v1 image (C64/C16) into a pulse stream; Segmentize splits it at pauses. |
c64/cbmtape |
Decode the standard Commodore KERNAL (ROM loader) tape encoding: blocks, headers, and paired header+data files with checksum verification. |
c64/c64 |
A minimal C64 machine model — RAM, the mos6502 CPU, a CIA pulse-feed tape model, a PC-hook registry, a RAM write log and an optional read probe — for running a self-modifying loader instead of decoding it, or tracing which game routine touches which memory. Optional standard KERNAL tape hooks included. |
c64/gfx |
C64/VIC rendering (palette, multicolor characters, hires sprites, multicolor bitmaps) plus general 2-D helpers (line drawing, markers, still/animated PNG output). |
c64/cmd/tapdump |
Print a pulse-width histogram and the pause-delimited segment map of a .tap — the usual first look at an unknown tape. |
amiga/adf |
Read a standard AmigaDOS floppy image (ADF): detect OFS/FFS, walk the directory tree, and extract file contents (handles hash chains, OFS data-block headers and multi-block file-extension chains). |
amiga/cmd/adfdump |
List an .adf's volume, directory tree and file sizes; -x outdir extracts every file preserving the directory structure. |
amiga/hunk |
Load an AmigaDOS hunk object/executable: place its CODE/DATA/BSS segments from a base, apply the 32-bit relocations, and return a flat image (and each segment's base) ready to disassemble. |
amiga/cmd/hunkload |
Print a hunk file's segment map; -o writes its flat relocated image and -syms writes its symbol table as a codetrace68k annotations file. |
amiga/iff |
Decode an IFF FORM…ILBM bitmap (planar BODY, ByteRun1/uncompressed, CMAP palette) into a Go image. |
amiga/icon |
Decode a Workbench .info icon (DiskObject + planar Image structs) into images, using the standard Workbench palette. |
amiga/cmd/amigapng |
Render an IFF ILBM or a .info icon to PNG (auto-detects the format). |
The go.work workspace lets the game tools resolve the local tools
module. Build and test each module from its own directory:
( cd tools && go test ./... )
( cd "Elite (C64)/extract" && go test ./... )
( cd "Fort Apocalypse (C64)/extract" && go test ./... )(The integration tests skip automatically when the .tap image is absent.)
The shared module's import path is stupidcoder.com/tools, so its commands
are run by that full path from anywhere in the workspace (the go.work is
found by walking up from the current directory), e.g.
go run stupidcoder.com/tools/c64/cmd/tapdump ....
tapdump is generic — point it at any C64 .tap to see its pulse encoding
and segment layout (the first step when approaching an unfamiliar tape):
go run stupidcoder.com/tools/c64/cmd/tapdump path/to/any.tapadfdump is likewise generic for Amiga disks — it lists and extracts the files
of any standard AmigaDOS floppy image:
go run stupidcoder.com/tools/amiga/cmd/adfdump path/to/disk.adf # list
go run stupidcoder.com/tools/amiga/cmd/adfdump -x out path/to/disk.adf # extractThe extract tools are not generic: each one is written for its game's
specific loader, so it only runs on that game's image. Run a game's extractor
from its own folder:
cd "Elite (C64)/extract" && go build && ./extract -o ../extracted ../Elite.tap
cd "Fort Apocalypse (C64)/extract" && go build && \
./extract -o ../extracted -dis ../Fort_Apocalypse.tapExtracting a new game means writing a new per-game extract tool on top of
tools (see "Two extraction strategies" below), not reusing one of these.
Disassemble any extracted file with the shared tools — dis6502 for 6502,
dis68k for 68000 (-skip steps past an Amiga hunk header):
go run stupidcoder.com/tools/cmd/dis6502 -start 8927 -end 8A40 \
"Fort Apocalypse (C64)/extracted/FORT-fast-7000.prg"
go run stupidcoder.com/tools/cmd/dis68k -skip 36 amiga-code-hunk.binThe two tapes need fundamentally different approaches, and the shared tools support both:
-
Declarative decoding (Fort Apocalypse). The fastloader is static: a fixed Novaload-family format with page records and checksums. The extractor reimplements that format on top of
tap+cbmtapeand reads the payload straight off the pulse stream. -
Run-the-loader emulation (Elite). The fastloader rewrites its own wire format on the fly — bit order, bits-per-byte, header size and sync handling all change mid-load, driven by patch blocks that arrive on the tape itself. Reimplementing the protocol is hopeless, so the extractor runs the actual loader on the
c64/c64+mos6502machine model, feeding it the tape pulses and logging every memory write the loader performs.
For a new C64 tape the workflow is: tapdump to see the encoding, cbmtape
to read the boot file, dis6502 to disassemble the loader, then choose a
strategy — a static loader is a new decoder package; a hostile self-modifying
one is a set of hooks on the c64/c64 machine model.