Speed up intcode instruction decoding by ebblake · Pull Request #68 · maneatingape/advent-of-code-rust

ebblake · 2026-05-07T18:35:59Z

Description

Pre-compile a map of decimal values to a 4-byte struct that is easier to utilize. This avoids lots of runtime divisions by 10/100/1000/10000, with a noticeable speed increase. Most obvious was day 25, which jumped from 2.5ms to 1.8ms on my machine.

Type of change

Performance improvement
Bug fix
Other

Checklist

Pull request title and commit messages are clear and informative.
Documentation has been updated if necessary.
Code style matches the existing code. This one is somewhat subjective, but try to "fit in" by
using the same naming conventions. Code should be portable, avoiding any
architecture-specific intrinsics.
Tests pass cargo test
Code is formatted cargo fmt -- `find . -name "*.rs"`
Code is linted cargo clippy --all-targets --all-features

Formatting and linting also can be executed by running just
(if installed) on the command line at the project root.

ebblake · 2026-05-07T18:36:50Z

Too bad Rust doesn't have C bitfields; packing this in 44k (and letting the compiler shift out modes under the hood) was a bit cuter than needing 88k; on the other hand, bytewise access may still be faster than shifting out bitfields.

ebblake · 2026-05-08T16:10:27Z

https://www.reddit.com/r/adventofcode/comments/1t675xe/comment/oknlj9z/?utm_source=share&utm_medium=web3x&utm_name=web3xcss&utm_term=1&utm_content=share_button is an alternative speedup using a single multiply and bit shifting rather than a lookup table.

maneatingape · 2026-05-08T17:24:13Z

I like the approach of using a LUT to decode the decimal representation into something more machine friendly.

I was noodling on the idea of going one step further. There are 3 x 3 x 3 x 9 = 243 combination of addressing modes, plus 1 for halt (although some of the combinations where instructions only take 1 or 2 operands would be technically invalid).

We could map the decimal instructions to a contigous range of 1 to 244 (reserving 0 for halt, like you've already done) fitting into a u8. Then with judicious use of macros, we could build a large match table. Due to the contigous range, this should likely desugar to a fast jump table.

Askalski did something similar, except only using opcodes that are actually used by the intcode programs. This has fewer cases but since they're non-contigous it might be faster to still try the lookup then contigous match.

ebblake · 2026-05-10T00:34:44Z

We could map the decimal instructions to a contigous range of 1 to 244 (reserving 0 for halt, like you've already done) fitting into a u8. Then with judicious use of macros, we could build a large match table. Due to the contigous range, this should likely desugar to a fast jump table.

I'll get to learn more about writing Rust macros, then. I like the idea, though - having decode[] be only 22k instead of 88k, and return a u8 that goes to a single match table instead of the current multi-function code, may give even more speedup. It would be easy to just encode my earlier for loop as computing decode[i + j*100 + k*1000 + l*10000] = l + k*3 + j*9 + i*27;, harder is figuring out how to write a macro that expands to 27 consecutive lines of the match statement, either 9 uses of a single macro, or different macros for: ternary set (1, 2, 7, 8), unary set (3, 9), unary get (4), binary jump (5, 6).

Pre-compile a map of decimal values to a 4-byte struct that is easier to utilize. This avoids lots of runtime divisions by 10/100/1000/10000, with a noticeable speed increase. Most obvious was day 25, which jumped from 2.5ms to 1.8ms on my machine.

ebblake · 2026-05-19T01:31:41Z

Attempt with Rust macro added on top (once again, giving credit where it is due: I had the overview of WHAT I wanted (pack DECODE into dense u8, and have a macro take nine small blocks and expand each into 27 that can then inline mode math), but it took several hours of back and forth with Google Gemini to pound the macro into something that would compile (not being able to generate match arms in isolation is a sad limitation of the current state of the art with Rust macros, where the C preprocessor can let you get away with truly atrocious stuff - but then again, Rust is trying to prove code safety even when boilerplate is hidden behind convenience). My runtime timings are rather variable - without the second patch, I was fairly consistent on running day 25 around 1.8ms; with the macro patch, I got some runs at 1.5ms, but then some other outliers at 1.9ms. Still a definitive win over omitting this series entirely, but harder to say whether the extra work of having a macro made a performance difference that is not possible without the macro.

ebblake · 2026-05-19T02:06:03Z

Day 9 showed more of a difference: 1.4ms without the series, 1.0ms with just the first patch, and 0.8ms with both patches. So macro-izing the decode lookup to let the compile see even more opportunities for constant propagation looks like a good move, even though writing a macro is such a pain.

Credit to AI: I relied on Google Gemini for the macro syntax; but with it written, I have validated that the full patch makes sense to me. Compress the decode from a 4-byte struct down to consecutive u8 values, and then update the execution loop to do a giant match statement on all 243+2 possible decode outputs. With the mode values decoded at macro time, this should give the inline a bit more to work with on providing a dense code jump table and inlined actions.

ebblake commented May 7, 2026

View reviewed changes

Comment thread src/year2019/intcode.rs

ebblake commented May 7, 2026

View reviewed changes

Comment thread src/year2019/intcode.rs

Speed up intcode instruction decoding

846834b

Pre-compile a map of decimal values to a 4-byte struct that is easier to utilize. This avoids lots of runtime divisions by 10/100/1000/10000, with a noticeable speed increase. Most obvious was day 25, which jumped from 2.5ms to 1.8ms on my machine.

ebblake force-pushed the 2019-25 branch from 7d3a259 to c88d69a Compare May 19, 2026 01:27

ebblake force-pushed the 2019-25 branch from c88d69a to 3bb0690 Compare May 29, 2026 19:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Speed up intcode instruction decoding#68

Speed up intcode instruction decoding#68
ebblake wants to merge 2 commits into
maneatingape:mainfrom
ebblake:2019-25

ebblake commented May 7, 2026

Uh oh!

ebblake commented May 7, 2026

Uh oh!

Uh oh!

Uh oh!

ebblake commented May 8, 2026

Uh oh!

maneatingape commented May 8, 2026 •

edited

Loading

Uh oh!

ebblake commented May 10, 2026

Uh oh!

ebblake commented May 19, 2026

Uh oh!

ebblake commented May 19, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ebblake commented May 7, 2026

Description

Type of change

Checklist

Uh oh!

ebblake commented May 7, 2026

Uh oh!

Uh oh!

Uh oh!

ebblake commented May 8, 2026

Uh oh!

maneatingape commented May 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ebblake commented May 10, 2026

Uh oh!

ebblake commented May 19, 2026

Uh oh!

ebblake commented May 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

maneatingape commented May 8, 2026 •

edited

Loading

ebblake commented May 19, 2026 •

edited

Loading