Use Held-Karp dynamic programming rather than permutations#70
Conversation
|
If you like this, 2015 day 13 and 2016 day 24 could use the same treatment. At which point, there would no longer be any clients of half_permutations in crate::util::slice; see also #69 that gets rid of permutations from 2019 day 7. |
| //! This is a variant of the classic NP-hard [Travelling Salesman Problem]. | ||
| //! | ||
| //! There are 8 locations, so naively it would require checking 8! = 40,320 permutations. We can | ||
| //! reduce this to 7!/2 = 2,520 permutations by arbitrarily choosing one of the locations as the | ||
| //! start, and skipping lexically reversed permutations (since the path a->b->c has the same | ||
| //! length as c->b->a). | ||
| //! length as c->b->a). Computing the shortest and longest path is then done by completing the | ||
| //! cycle for each permutation, then discarding the longest or shortest edge seen along the way. | ||
| //! Skipping lexically reversed permutations is possible with [Steinhaus-Johnson-Trotter's algorithm]. |
There was a problem hiding this comment.
Not sure I got the markdown correct for forward-references of a hyperlink in the body vs. the URL later in the document, compared to the older inline URL.
0589718 to
e4784c1
Compare
Switch to a different algorithm for finding the shortest path, using O(n^2*2^n) rather than O(n!) work. For a given part, this results in slightly more calls to .max or .min (3584 instead of 2540), but less overhead than what permutations required for shuffling data around. I would have loved to have iterated on 3..127 instead of 3..255; that produces only 1344 calls to .max, but only works on inputs where dropping the shortest edge from the longest cycle still produces the longest path, regardless of the point chosen as anchor. On my laptop, performance improves from about 40us to 12us. I also tried separating the two tables (doing Held-Karp in part1() and part2() rather than joint iteration in parse(), but that had more overhead with duplicated set traversal.
Switch to a different algorithm for finding the shortest path, using O(n^2*2^n) rather than O(n!) work. For a given part, this results in slightly more calls to .max or .min (3584 instead of 2540), but less overhead than what permutations required for shuffling data around. Doing both parts at once requires a bit more work than day 9; part 2 is a longest path that requires 255 iterations from an implicit ninth node with distance zero, at which point the part one details can still be read off the table to manually bypass the ninth node if we can determine what the first node was. But that requires storing a bit more information in the table. On my laptop, performance improves from about 33us to 18us.
| // Initialize a table for each part: 2ⁿ⁻¹ sets with n-1 distances per set. Default each g({k},k) | ||
| // singleton to distance[0][k+1] (since bit 0 maps to node 1), while the initial value of other | ||
| // sets does not matter. | ||
| let mut table = [[0_u16; 7]; 1 << 7]; |
There was a problem hiding this comment.
Day 9 and 13 used vec for the distance grid because the tests had fewer nodes, so I did likewise for the table. Day 24 uses an array for the distance grid, the test failure comes from having fewer nodes without being prepared for it.
There was a problem hiding this comment.
I like the 2D array a bit better (it's nice for table[set][m]); oversized the table to 2^11 entries when the test only needs 5*2^5 entries is not too bad, so the fix may just be clamping the iterations to the actual node count even when the table is oversized, rather than converting to 1D vec
There was a problem hiding this comment.
Test failure fixed in the latest iteration, but I'm still torn between whether the 2D array (exact size for puzzle but oversized for test cases) vs. 1D vector (always exact size) is better, and whether the three puzzles should be consistent in using the same style.
68b8463 to
b9ad35f
Compare
|
Reddit writeup on the speedups gained by this patch series: https://www.reddit.com/r/adventofcode/comments/1te7ayd/2015_day_9rust_solving_tsp_faster_than/ |
Switch to a different algorithm for finding the shortest path, using O(n^2*2^n) rather than O(n!) work. 2016 day 24 is nicer than 2015 days 9 and 13, in that the problem at hand really is asking about a path or cycle pinned to node 0, so we can drop to n=7 for a mere 7*6/2*128/2 = 1344 comparisons, outright beating the 2520 comparisons of the prior 7!/2 approach. On my laptop, performance improves from about 260us to 240us.
Description
Experiment with a different algorithm for finding the shortest path, using O(n^2*2^n) rather than O(n!) work. For a given part, the old code had 2520 callbacks from half_permutations with 8 calls to .max per callback; the new code has only 3584 total calls to .max, and with less overhead than what permutations required for shuffling data around.
I would have loved to have iterated on 3..127 instead of 3..255; that produces only 1344 calls to .max, but only works on inputs where dropping the shortest edge from the longest cycle still produces the longest path.
On my laptop, performance improves from about 40us to 12us. I also tried separating the two tables (doing Held-Karp in part1() and part2() rather than joint iteration in parse(), but that had more overhead with duplicated set traversal.
Type of change
Checklist
using the same naming conventions. Code should be portable, avoiding any
architecture-specific intrinsics.
cargo testcargo fmt -- `find . -name "*.rs"`cargo clippy --all-targets --all-featuresFormatting and linting also can be executed by running
just(if installed) on the command line at the project root.