Skip to content

Use A* search for less overall work#74

Open
ebblake wants to merge 3 commits into
maneatingape:mainfrom
ebblake:2019-18
Open

Use A* search for less overall work#74
ebblake wants to merge 3 commits into
maneatingape:mainfrom
ebblake:2019-18

Conversation

@ebblake

@ebblake ebblake commented May 21, 2026

Copy link
Copy Markdown
Contributor

Description

The search can be steered towards the global minimum by pre-computing a heuristic of the absolute minimum possible distance that each key can contribute, then updating that heuristic with a subtraction per key visited, making it a nice lightweight O(1) heuristic. By the time we reach the goal of zero keys remaining, the heuristic also reaches zero, so it is consistent and we do not have to worry about revisiting a node with a lower distance from an alternative path. Then, using an O(n) heuristic for part 2 (but not part 1) got even more speed by pruning much more of the search space.

On my laptop, this improves runtimes dramtically, even if the numbers are a bit noisy. A more direct analysis, by instrumenting the number of cache accesses and items of work processed when using my input file, shows that part 2 sees the bigger benefit:

                        cache.entry()  todo.push()  todo.pop() runtime
    part 1 pre-series    101494         23195        22762      5.2ms
    part 1 post-series    65546         23019        21410      4.6ms
    part 2 pre-series    120813         33238        31437     12.1ms
    part 2 post-series    33577         13826         6689      2.3ms

Type of change

  • Performance improvement
  • Bug fix
  • Other

Checklist

  • Pull request title and commit messages are clear and informative.
  • Documentation has been updated if necessary.
  • Code style matches the existing code. This one is somewhat subjective, but try to "fit in" by
    using the same naming conventions. Code should be portable, avoiding any
    architecture-specific intrinsics.
  • Tests pass cargo test
  • Code is formatted cargo fmt -- `find . -name "*.rs"`
  • Code is linted cargo clippy --all-targets --all-features

Formatting and linting also can be executed by running just
(if installed) on the command line at the project root.

@ebblake

ebblake commented May 21, 2026

Copy link
Copy Markdown
Contributor Author

just gave me a weird warning that seemed unrelated to my original patch, hence my s/maze/matrix/ to silence it:

cargo fmt -- `find . -name "*.rs"`
cargo clippy --all-targets --all-features
warning: field name starts with the struct's name
  --> src/year2019/day18.rs:80:5
   |
80 |     maze: [[Door; 30]; 30],
   |     ^^^^^^^^^^^^^^^^^^^^^^
   |
   = help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#struct_field_names
   = note: requested on the command line with `-W clippy::struct-field-names`

@ebblake ebblake force-pushed the 2019-18 branch 3 times, most recently from 8060dec to 91dec87 Compare May 21, 2026 15:25
@ebblake

ebblake commented May 21, 2026

Copy link
Copy Markdown
Contributor Author

Here's an alternative O(n) heuristic. It made part1 almost twice as long (the number of keys reachable from one bot is high enough that a lot of time is being spent on computing the max leg remaining), but part2 noticeably faster (with four bots, the effort to find max leg is spread out more evenly, and max leg rather than sum of min legs steers better, even though it requires more effort to compute). Maybe a hybrid is worthwhile? (No heuristic for part1, O(n) heuristic for part 2)

From 84c924e844b0da501c57c7c3aafb41e46b05b25d Mon Sep 17 00:00:00 2001
From: Eric Blake <eblake@redhat.com>
Date: Thu, 21 May 2026 10:37:28 -0500
Subject: [PATCH] tmp
Content-type: text/plain

---
 src/year2019/day18.rs | 46 ++++++++++++++++++++++++++-----------------
 1 file changed, 28 insertions(+), 18 deletions(-)

diff --git a/src/year2019/day18.rs b/src/year2019/day18.rs
index 606c2f7..536b932 100644
--- a/src/year2019/day18.rs
+++ b/src/year2019/day18.rs
@@ -72,16 +72,16 @@ struct Door {
     needed: u32,
 }

+type Matrix = [[Door; 30]; 30];
+
 /// `initial` is the complete set of keys that we need to collect. Will always be binary
 /// `11111111111111111111111111` for the real input but fewer for sample data.
 ///
 /// `matrix` is the adjacency of distances and doors between each pair of keys and the robots
 /// starting locations.
-/// `minimum` is the smallest distance from a key to any of its neighbors, for the A* heuristic.
 struct Maze {
     initial: State,
-    matrix: [[Door; 30]; 30],
-    minimum: [u32; 26],
+    matrix: Matrix,
 }

 pub fn parse(input: &str) -> Grid<u8> {
@@ -181,25 +181,19 @@ fn parse_maze(width: usize, bytes: &[u8]) -> Maze {
         }
     }

-    let mut minimum = [0; 26];
-    for i in initial.remaining.biterator() {
-        minimum[i] =
-            matrix[i].iter().map(|d| d.distance).filter(|&dist| dist > 0).min().unwrap_or(0);
-    }
-
-    Maze { initial, matrix, minimum }
+    Maze { initial, matrix }
 }

 fn explore(width: usize, bytes: &[u8]) -> u32 {
     let mut todo = MinHeap::with_capacity(5_000);
     let mut cache = FastMap::with_capacity(5_000);

-    let Maze { initial, matrix, minimum } = parse_maze(width, bytes);
-    let heuristic = minimum.iter().sum();
-    todo.push(heuristic, (initial, heuristic));
+    let Maze { initial, matrix } = parse_maze(width, bytes);
+    let heur = heuristic(initial, &matrix);
+    todo.push(heur, (initial, heur));

-    while let Some((guess, (State { position, remaining }, heuristic))) = todo.pop() {
-        let total = guess - heuristic;
+    while let Some((guess, (State { position, remaining }, heur))) = todo.pop() {
+        let total = guess - heur;
         // Finish immediately if no keys left.
         // Since we're using A* with a consistent heuristic this will always be the optimal solution.
         if remaining == 0 {
@@ -220,14 +214,14 @@ fn explore(width: usize, bytes: &[u8]) -> u32 {
                         position: position ^ from_mask ^ to_mask,
                         remaining: remaining ^ to_mask,
                     };
-                    let next_heuristic = heuristic - minimum[to];
-                    let next_guess = total + distance + next_heuristic;
+                    let next_heur = heuristic(next_state, &matrix);
+                    let next_guess = total + distance + next_heur;

                     // Memoize previously seen states to eliminate suboptimal states right away.
                     let best = cache.entry(next_state).or_insert(u32::MAX);
                     if next_guess < *best {
                         *best = next_guess;
-                        todo.push(next_guess, (next_state, next_heuristic));
+                        todo.push(next_guess, (next_state, next_heur));
                     }
                 }
             }
@@ -245,3 +239,19 @@ fn is_key(b: u8) -> Option<usize> {
 fn is_door(b: u8) -> Option<usize> {
     b.is_ascii_uppercase().then(|| (b - b'A') as usize)
 }
+
+fn heuristic(state: State, matrix: &Matrix) -> u32 {
+    let mut heur = 0;
+
+    // For each robot, compute the worst-case distance it must travel to a remaining key.
+    for bot in state.position.biterator() {
+        let mut dist = 0;
+        for key in state.remaining.biterator() {
+            if matrix[bot][key].distance != u32::MAX {
+                dist = dist.max(matrix[bot][key].distance);
+            }
+        }
+        heur += dist;
+    }
+    heur
+}
-- 
2.54.0

with this alternative, tracing shows

                    cache.entry()  todo.push()  todo.pop()  runtime
part 1 pre-patch    101494         23195        22762        5.3ms
part 1 O(1)-patch    96755         23054        21528        5.0ms
part 1 O(n)-patch   110269         25376        24334       10.1ms
part 2 pre-patch    120813         31437        33238       12.3ms
part 2 O(1)-patch    99325         30438        24419       10.7ms
part 2 O(n)-patch    33625         13826         6689        8.0ms

@ebblake

ebblake commented May 21, 2026

Copy link
Copy Markdown
Contributor Author

Gemini AI suggests this approach for a hybrid, that picks the best heuristic for both parts:


// The const generic flag tells the compiler to specialize this function
fn explore<const IS_PART_2: bool>(width: usize, bytes: &[u8]) -> u32 {
    let mut todo = MinHeap::with_capacity(5_000);
    let mut cache = FastMap::with_capacity(5_000);
    let Maze { initial, matrix, minimum } = parse_maze(width, bytes);

    // Compute initial heuristic based on the compile-time flag
    let heur = if IS_PART_2 {
        heuristic_part2(initial, &matrix)
    } else {
        minimum.iter().sum()
    };

    todo.push(heur, (initial, heur));

    while let Some((guess, (state, heur))) = todo.pop() {
        // ... (hot loop logic) ...

        let next_heur = if IS_PART_2 {
            heuristic_part2(next_state, &matrix)
        } else {
            heur - minimum[to] // Fast O(1) step subtraction
        };
    }
    // ...
}

Comment thread src/year2019/day18.rs Outdated
@ebblake ebblake force-pushed the 2019-18 branch 3 times, most recently from cdfb4a0 to 8a0d745 Compare May 22, 2026 01:41
Comment thread src/year2019/day18.rs Outdated
Comment thread src/year2019/day18.rs
ebblake added 3 commits May 22, 2026 13:37
Pre-filtering the list of keys in the same quadrant as the starting
point with a single bitwise op is faster than iterating over all keys
and then doing a branch on whether the distance was u32::MAX just to
toss out 75% of the iterations.

Likewise, a given state can be queued with more than one distance as
different paths percolate to the top of the priority queue.  For my
input, I traced that part1 pops a revisited state 6971 times (out of
22762 pops), and part2 2237 times (out of 31437). It is slightly
faster to check the cached distance up front than to repeat
next-neighbor checks that will not find any new neighbors (reducing
the number of later cache accesses from 101494 to 70304 for part1, and
120813 to 113051 for part 2).

While touching this, `just` complained that:
warning: field name starts with the struct's name
  --> src/year2019/day18.rs:80:5
   |
80 |     maze: [[Door; 30]; 30],
   |     ^^^^^^^^^^^^^^^^^^^^^^
   |
   = help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#struct_field_names
   = note: requested on the command line with `-W clippy::struct-field-names`

so I renamed that field to matrix.

On my laptop, part1 speeds up from 5.1ms to 4.8ms, and part2 is more
dramtic from 12.1ms to 7.4ms.
The search can be steered towards the global minimum by pre-computing
a heuristic of the absolute minimum possible distance that each key
can contribute, then updating that heuristic with a subtraction per
key visited, making it a nice lightweight O(1) heuristic.  By the time
we reach the goal of zero keys remaining, the heuristic also reaches
zero - while it often underestimates, it is consistent and never
overestimates.

On my laptop with my input, and adding some analysis (although the
runtime numbers are a bit noisy), I can see that both parts benefit,
although part 2 picks up the best improvement.

                  cache.entry()  todo.push()  todo.pop() runtime
part 1 pre-patch     70304         23195        22762     4.8ms
part 1 post-patch    65546         23019        21410     4.6ms
part 2 pre-patch    113051         33238        31437     6.2ms
part 2 post-patch    79301         26608        20196     4.8ms
An O(1) heuristic is always ideal, and for part 1, anything else that
I tried cost more in overhead than what it saved in nodes visited.
But for part 2, the exponential explosion caused by multiple robots
really did benefit from a more responsive O(n) heuristic to prune the
focus towards advancing the robot that would unlock the most keys,
rather than the robot with the shortest distance.

With this latest patch, my table of results now looks like:

                    cache.entry()  todo.push()  todo.pop() runtime
part 1 pre-series    101494         23195        22762      5.2ms
part 1 pre-patch      65546         23019        21410      4.6ms
part 1 this patch     65546         23019        21410      4.6ms
part 2 pre-series    120813         33238        31437     12.1ms
part 2 pre-patch      79301         26608        20196      4.8ms
part 2 this patch     33577         13826         6689      2.3ms
@ebblake

ebblake commented May 23, 2026

Copy link
Copy Markdown
Contributor Author

I'm still investigating if a dynamic programming approach similar to Held-Karp can outperform A*. One benefit of the dynamic approach is that you can solve both parts at once. With A*, your cache is on (keys,positions)->distance, which is two separate caches for the two parts. But with dynamic programming, the cache is (keys,last key)->(distance,positions) which can cram both parts into the same cache for less traversal overhead. The dynamic approach is up to O(n^2) per set visited (for each reachable key in a set, find the minimum distance when appending that key to any of the other reachable keys of the subset - in general n is much smaller than 26 because of the non-viable keys with a nonzero need), but I don't yet have a feel for how many set visits can be pruned from the reachability front to avoid a full-blown 2^26 set visits.

@ebblake

ebblake commented May 27, 2026

Copy link
Copy Markdown
Contributor Author

I'm still investigating if a dynamic programming approach similar to Held-Karp can outperform A*.

Nope - although I did get a dynamic search working, it visits more sets+positions (31272 on my input - but still much better than a full 2^26 sets) than a directed A* search. At this point, I'm happy with the current state of the patch as being the fastest solution I can come up with.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant