Speed up intcode instruction decoding#68
Conversation
|
Too bad Rust doesn't have C bitfields; packing this in 44k (and letting the compiler shift out modes under the hood) was a bit cuter than needing 88k; on the other hand, bytewise access may still be faster than shifting out bitfields. |
|
https://www.reddit.com/r/adventofcode/comments/1t675xe/comment/oknlj9z/?utm_source=share&utm_medium=web3x&utm_name=web3xcss&utm_term=1&utm_content=share_button is an alternative speedup using a single multiply and bit shifting rather than a lookup table. |
|
I like the approach of using a LUT to decode the decimal representation into something more machine friendly. I was noodling on the idea of going one step further. There are 3 x 3 x 3 x 9 = 243 combination of addressing modes, plus 1 for halt (although some of the combinations where instructions only take 1 or 2 operands would be technically invalid). We could map the decimal instructions to a contigous range of 1 to 244 (reserving 0 for halt, like you've already done) fitting into a Askalski did something similar, except only using opcodes that are actually used by the intcode programs. This has fewer cases but since they're non-contigous it might be faster to still try the lookup then contigous match. |
I'll get to learn more about writing Rust macros, then. I like the idea, though - having decode[] be only 22k instead of 88k, and return a u8 that goes to a single match table instead of the current multi-function code, may give even more speedup. It would be easy to just encode my earlier for loop as computing |
Pre-compile a map of decimal values to a 4-byte struct that is easier to utilize. This avoids lots of runtime divisions by 10/100/1000/10000, with a noticeable speed increase. Most obvious was day 25, which jumped from 2.5ms to 1.8ms on my machine.
|
Attempt with Rust macro added on top (once again, giving credit where it is due: I had the overview of WHAT I wanted (pack DECODE into dense u8, and have a macro take nine small blocks and expand each into 27 that can then inline mode math), but it took several hours of back and forth with Google Gemini to pound the macro into something that would compile (not being able to generate match arms in isolation is a sad limitation of the current state of the art with Rust macros, where the C preprocessor can let you get away with truly atrocious stuff - but then again, Rust is trying to prove code safety even when boilerplate is hidden behind convenience). My runtime timings are rather variable - without the second patch, I was fairly consistent on running day 25 around 1.8ms; with the macro patch, I got some runs at 1.5ms, but then some other outliers at 1.9ms. Still a definitive win over omitting this series entirely, but harder to say whether the extra work of having a macro made a performance difference that is not possible without the macro. |
|
Day 9 showed more of a difference: 1.4ms without the series, 1.0ms with just the first patch, and 0.8ms with both patches. So macro-izing the decode lookup to let the compile see even more opportunities for constant propagation looks like a good move, even though writing a macro is such a pain. |
Credit to AI: I relied on Google Gemini for the macro syntax; but with it written, I have validated that the full patch makes sense to me. Compress the decode from a 4-byte struct down to consecutive u8 values, and then update the execution loop to do a giant match statement on all 243+2 possible decode outputs. With the mode values decoded at macro time, this should give the inline a bit more to work with on providing a dense code jump table and inlined actions.
Description
Pre-compile a map of decimal values to a 4-byte struct that is easier to utilize. This avoids lots of runtime divisions by 10/100/1000/10000, with a noticeable speed increase. Most obvious was day 25, which jumped from 2.5ms to 1.8ms on my machine.
Type of change
Checklist
using the same naming conventions. Code should be portable, avoiding any
architecture-specific intrinsics.
cargo testcargo fmt -- `find . -name "*.rs"`cargo clippy --all-targets --all-featuresFormatting and linting also can be executed by running
just(if installed) on the command line at the project root.