Use A* search for less overall work#74
Conversation
|
|
8060dec to
91dec87
Compare
|
Here's an alternative O(n) heuristic. It made part1 almost twice as long (the number of keys reachable from one bot is high enough that a lot of time is being spent on computing the max leg remaining), but part2 noticeably faster (with four bots, the effort to find max leg is spread out more evenly, and max leg rather than sum of min legs steers better, even though it requires more effort to compute). Maybe a hybrid is worthwhile? (No heuristic for part1, O(n) heuristic for part 2) with this alternative, tracing shows |
|
Gemini AI suggests this approach for a hybrid, that picks the best heuristic for both parts: |
cdfb4a0 to
8a0d745
Compare
Pre-filtering the list of keys in the same quadrant as the starting point with a single bitwise op is faster than iterating over all keys and then doing a branch on whether the distance was u32::MAX just to toss out 75% of the iterations. Likewise, a given state can be queued with more than one distance as different paths percolate to the top of the priority queue. For my input, I traced that part1 pops a revisited state 6971 times (out of 22762 pops), and part2 2237 times (out of 31437). It is slightly faster to check the cached distance up front than to repeat next-neighbor checks that will not find any new neighbors (reducing the number of later cache accesses from 101494 to 70304 for part1, and 120813 to 113051 for part 2). While touching this, `just` complained that: warning: field name starts with the struct's name --> src/year2019/day18.rs:80:5 | 80 | maze: [[Door; 30]; 30], | ^^^^^^^^^^^^^^^^^^^^^^ | = help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#struct_field_names = note: requested on the command line with `-W clippy::struct-field-names` so I renamed that field to matrix. On my laptop, part1 speeds up from 5.1ms to 4.8ms, and part2 is more dramtic from 12.1ms to 7.4ms.
The search can be steered towards the global minimum by pre-computing
a heuristic of the absolute minimum possible distance that each key
can contribute, then updating that heuristic with a subtraction per
key visited, making it a nice lightweight O(1) heuristic. By the time
we reach the goal of zero keys remaining, the heuristic also reaches
zero - while it often underestimates, it is consistent and never
overestimates.
On my laptop with my input, and adding some analysis (although the
runtime numbers are a bit noisy), I can see that both parts benefit,
although part 2 picks up the best improvement.
cache.entry() todo.push() todo.pop() runtime
part 1 pre-patch 70304 23195 22762 4.8ms
part 1 post-patch 65546 23019 21410 4.6ms
part 2 pre-patch 113051 33238 31437 6.2ms
part 2 post-patch 79301 26608 20196 4.8ms
An O(1) heuristic is always ideal, and for part 1, anything else that
I tried cost more in overhead than what it saved in nodes visited.
But for part 2, the exponential explosion caused by multiple robots
really did benefit from a more responsive O(n) heuristic to prune the
focus towards advancing the robot that would unlock the most keys,
rather than the robot with the shortest distance.
With this latest patch, my table of results now looks like:
cache.entry() todo.push() todo.pop() runtime
part 1 pre-series 101494 23195 22762 5.2ms
part 1 pre-patch 65546 23019 21410 4.6ms
part 1 this patch 65546 23019 21410 4.6ms
part 2 pre-series 120813 33238 31437 12.1ms
part 2 pre-patch 79301 26608 20196 4.8ms
part 2 this patch 33577 13826 6689 2.3ms
|
I'm still investigating if a dynamic programming approach similar to Held-Karp can outperform A*. One benefit of the dynamic approach is that you can solve both parts at once. With A*, your cache is on (keys,positions)->distance, which is two separate caches for the two parts. But with dynamic programming, the cache is (keys,last key)->(distance,positions) which can cram both parts into the same cache for less traversal overhead. The dynamic approach is up to O(n^2) per set visited (for each reachable key in a set, find the minimum distance when appending that key to any of the other reachable keys of the subset - in general n is much smaller than 26 because of the non-viable keys with a nonzero need), but I don't yet have a feel for how many set visits can be pruned from the reachability front to avoid a full-blown 2^26 set visits. |
Nope - although I did get a dynamic search working, it visits more sets+positions (31272 on my input - but still much better than a full 2^26 sets) than a directed A* search. At this point, I'm happy with the current state of the patch as being the fastest solution I can come up with. |
Description
The search can be steered towards the global minimum by pre-computing a heuristic of the absolute minimum possible distance that each key can contribute, then updating that heuristic with a subtraction per key visited, making it a nice lightweight O(1) heuristic. By the time we reach the goal of zero keys remaining, the heuristic also reaches zero, so it is consistent and we do not have to worry about revisiting a node with a lower distance from an alternative path. Then, using an O(n) heuristic for part 2 (but not part 1) got even more speed by pruning much more of the search space.
On my laptop, this improves runtimes dramtically, even if the numbers are a bit noisy. A more direct analysis, by instrumenting the number of cache accesses and items of work processed when using my input file, shows that part 2 sees the bigger benefit:
Type of change
Checklist
using the same naming conventions. Code should be portable, avoiding any
architecture-specific intrinsics.
cargo testcargo fmt -- `find . -name "*.rs"`cargo clippy --all-targets --all-featuresFormatting and linting also can be executed by running
just(if installed) on the command line at the project root.