Skip to content

Speed up intcode instruction decoding#68

Open
ebblake wants to merge 2 commits into
maneatingape:mainfrom
ebblake:2019-25
Open

Speed up intcode instruction decoding#68
ebblake wants to merge 2 commits into
maneatingape:mainfrom
ebblake:2019-25

Conversation

@ebblake

@ebblake ebblake commented May 7, 2026

Copy link
Copy Markdown
Contributor

Description

Pre-compile a map of decimal values to a 4-byte struct that is easier to utilize. This avoids lots of runtime divisions by 10/100/1000/10000, with a noticeable speed increase. Most obvious was day 25, which jumped from 2.5ms to 1.8ms on my machine.

Type of change

  • Performance improvement
  • Bug fix
  • Other

Checklist

  • Pull request title and commit messages are clear and informative.
  • Documentation has been updated if necessary.
  • Code style matches the existing code. This one is somewhat subjective, but try to "fit in" by
    using the same naming conventions. Code should be portable, avoiding any
    architecture-specific intrinsics.
  • Tests pass cargo test
  • Code is formatted cargo fmt -- `find . -name "*.rs"`
  • Code is linted cargo clippy --all-targets --all-features

Formatting and linting also can be executed by running just
(if installed) on the command line at the project root.

@ebblake

ebblake commented May 7, 2026

Copy link
Copy Markdown
Contributor Author

Too bad Rust doesn't have C bitfields; packing this in 44k (and letting the compiler shift out modes under the hood) was a bit cuter than needing 88k; on the other hand, bytewise access may still be faster than shifting out bitfields.

Comment thread src/year2019/intcode.rs
Comment thread src/year2019/intcode.rs
@ebblake

ebblake commented May 8, 2026

Copy link
Copy Markdown
Contributor Author

https://www.reddit.com/r/adventofcode/comments/1t675xe/comment/oknlj9z/?utm_source=share&utm_medium=web3x&utm_name=web3xcss&utm_term=1&utm_content=share_button is an alternative speedup using a single multiply and bit shifting rather than a lookup table.

@maneatingape

maneatingape commented May 8, 2026

Copy link
Copy Markdown
Owner

I like the approach of using a LUT to decode the decimal representation into something more machine friendly.

I was noodling on the idea of going one step further. There are 3 x 3 x 3 x 9 = 243 combination of addressing modes, plus 1 for halt (although some of the combinations where instructions only take 1 or 2 operands would be technically invalid).

We could map the decimal instructions to a contigous range of 1 to 244 (reserving 0 for halt, like you've already done) fitting into a u8. Then with judicious use of macros, we could build a large match table. Due to the contigous range, this should likely desugar to a fast jump table.

Askalski did something similar, except only using opcodes that are actually used by the intcode programs. This has fewer cases but since they're non-contigous it might be faster to still try the lookup then contigous match.

@ebblake

ebblake commented May 10, 2026

Copy link
Copy Markdown
Contributor Author

We could map the decimal instructions to a contigous range of 1 to 244 (reserving 0 for halt, like you've already done) fitting into a u8. Then with judicious use of macros, we could build a large match table. Due to the contigous range, this should likely desugar to a fast jump table.

I'll get to learn more about writing Rust macros, then. I like the idea, though - having decode[] be only 22k instead of 88k, and return a u8 that goes to a single match table instead of the current multi-function code, may give even more speedup. It would be easy to just encode my earlier for loop as computing decode[i + j*100 + k*1000 + l*10000] = l + k*3 + j*9 + i*27;, harder is figuring out how to write a macro that expands to 27 consecutive lines of the match statement, either 9 uses of a single macro, or different macros for: ternary set (1, 2, 7, 8), unary set (3, 9), unary get (4), binary jump (5, 6).

Pre-compile a map of decimal values to a 4-byte struct that is easier
to utilize.  This avoids lots of runtime divisions by
10/100/1000/10000, with a noticeable speed increase.  Most obvious was
day 25, which jumped from 2.5ms to 1.8ms on my machine.
@ebblake

ebblake commented May 19, 2026

Copy link
Copy Markdown
Contributor Author

Attempt with Rust macro added on top (once again, giving credit where it is due: I had the overview of WHAT I wanted (pack DECODE into dense u8, and have a macro take nine small blocks and expand each into 27 that can then inline mode math), but it took several hours of back and forth with Google Gemini to pound the macro into something that would compile (not being able to generate match arms in isolation is a sad limitation of the current state of the art with Rust macros, where the C preprocessor can let you get away with truly atrocious stuff - but then again, Rust is trying to prove code safety even when boilerplate is hidden behind convenience). My runtime timings are rather variable - without the second patch, I was fairly consistent on running day 25 around 1.8ms; with the macro patch, I got some runs at 1.5ms, but then some other outliers at 1.9ms. Still a definitive win over omitting this series entirely, but harder to say whether the extra work of having a macro made a performance difference that is not possible without the macro.

@ebblake

ebblake commented May 19, 2026

Copy link
Copy Markdown
Contributor Author

Day 9 showed more of a difference: 1.4ms without the series, 1.0ms with just the first patch, and 0.8ms with both patches. So macro-izing the decode lookup to let the compile see even more opportunities for constant propagation looks like a good move, even though writing a macro is such a pain.

Credit to AI: I relied on Google Gemini for the macro syntax; but with
it written, I have validated that the full patch makes sense to me.

Compress the decode from a 4-byte struct down to consecutive u8
values, and then update the execution loop to do a giant match
statement on all 243+2 possible decode outputs.  With the mode values
decoded at macro time, this should give the inline a bit more to work
with on providing a dense code jump table and inlined actions.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants