Skip to content

Use Held-Karp dynamic programming rather than permutations#70

Open
ebblake wants to merge 3 commits into
maneatingape:mainfrom
ebblake:2015-09.2
Open

Use Held-Karp dynamic programming rather than permutations#70
ebblake wants to merge 3 commits into
maneatingape:mainfrom
ebblake:2015-09.2

Conversation

@ebblake

@ebblake ebblake commented May 15, 2026

Copy link
Copy Markdown
Contributor

Description

Experiment with a different algorithm for finding the shortest path, using O(n^2*2^n) rather than O(n!) work. For a given part, the old code had 2520 callbacks from half_permutations with 8 calls to .max per callback; the new code has only 3584 total calls to .max, and with less overhead than what permutations required for shuffling data around.

I would have loved to have iterated on 3..127 instead of 3..255; that produces only 1344 calls to .max, but only works on inputs where dropping the shortest edge from the longest cycle still produces the longest path.

On my laptop, performance improves from about 40us to 12us. I also tried separating the two tables (doing Held-Karp in part1() and part2() rather than joint iteration in parse(), but that had more overhead with duplicated set traversal.

Type of change

  • Performance improvement
  • Bug fix
  • Other

Checklist

  • Pull request title and commit messages are clear and informative.
  • Documentation has been updated if necessary.
  • Code style matches the existing code. This one is somewhat subjective, but try to "fit in" by
    using the same naming conventions. Code should be portable, avoiding any
    architecture-specific intrinsics.
  • Tests pass cargo test
  • Code is formatted cargo fmt -- `find . -name "*.rs"`
  • Code is linted cargo clippy --all-targets --all-features

Formatting and linting also can be executed by running just
(if installed) on the command line at the project root.

@ebblake

ebblake commented May 15, 2026

Copy link
Copy Markdown
Contributor Author

If you like this, 2015 day 13 and 2016 day 24 could use the same treatment. At which point, there would no longer be any clients of half_permutations in crate::util::slice; see also #69 that gets rid of permutations from 2019 day 7.

Comment thread src/year2015/day09.rs Outdated
Comment on lines +3 to +10
//! This is a variant of the classic NP-hard [Travelling Salesman Problem].
//!
//! There are 8 locations, so naively it would require checking 8! = 40,320 permutations. We can
//! reduce this to 7!/2 = 2,520 permutations by arbitrarily choosing one of the locations as the
//! start, and skipping lexically reversed permutations (since the path a->b->c has the same
//! length as c->b->a).
//! length as c->b->a). Computing the shortest and longest path is then done by completing the
//! cycle for each permutation, then discarding the longest or shortest edge seen along the way.
//! Skipping lexically reversed permutations is possible with [Steinhaus-Johnson-Trotter's algorithm].

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I got the markdown correct for forward-references of a hyperlink in the body vs. the URL later in the document, compared to the older inline URL.

Comment thread src/year2015/day09.rs Outdated
Comment thread src/year2015/day09.rs Outdated
@ebblake ebblake force-pushed the 2015-09.2 branch 4 times, most recently from 0589718 to e4784c1 Compare May 15, 2026 20:08
Switch to a different algorithm for finding the shortest path, using
O(n^2*2^n) rather than O(n!) work.  For a given part, this results in
slightly more calls to .max or .min (3584 instead of 2540), but less
overhead than what permutations required for shuffling data around.

I would have loved to have iterated on 3..127 instead of 3..255; that
produces only 1344 calls to .max, but only works on inputs where
dropping the shortest edge from the longest cycle still produces the
longest path, regardless of the point chosen as anchor.

On my laptop, performance improves from about 40us to 12us. I also
tried separating the two tables (doing Held-Karp in part1() and
part2() rather than joint iteration in parse(), but that had more
overhead with duplicated set traversal.
Switch to a different algorithm for finding the shortest path, using
O(n^2*2^n) rather than O(n!) work.  For a given part, this results in
slightly more calls to .max or .min (3584 instead of 2540), but less
overhead than what permutations required for shuffling data around.

Doing both parts at once requires a bit more work than day 9; part 2
is a longest path that requires 255 iterations from an implicit ninth
node with distance zero, at which point the part one details can still
be read off the table to manually bypass the ninth node if we can
determine what the first node was.  But that requires storing a bit
more information in the table.

On my laptop, performance improves from about 33us to 18us.
Comment thread src/year2016/day24.rs
// Initialize a table for each part: 2ⁿ⁻¹ sets with n-1 distances per set. Default each g({k},k)
// singleton to distance[0][k+1] (since bit 0 maps to node 1), while the initial value of other
// sets does not matter.
let mut table = [[0_u16; 7]; 1 << 7];

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Day 9 and 13 used vec for the distance grid because the tests had fewer nodes, so I did likewise for the table. Day 24 uses an array for the distance grid, the test failure comes from having fewer nodes without being prepared for it.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the 2D array a bit better (it's nice for table[set][m]); oversized the table to 2^11 entries when the test only needs 5*2^5 entries is not too bad, so the fix may just be clamping the iterations to the actual node count even when the table is oversized, rather than converting to 1D vec

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test failure fixed in the latest iteration, but I'm still torn between whether the 2D array (exact size for puzzle but oversized for test cases) vs. 1D vector (always exact size) is better, and whether the three puzzles should be consistent in using the same style.

@ebblake ebblake force-pushed the 2015-09.2 branch 2 times, most recently from 68b8463 to b9ad35f Compare May 16, 2026 20:14
@ebblake

ebblake commented May 18, 2026

Copy link
Copy Markdown
Contributor Author

Reddit writeup on the speedups gained by this patch series: https://www.reddit.com/r/adventofcode/comments/1te7ayd/2015_day_9rust_solving_tsp_faster_than/

Switch to a different algorithm for finding the shortest path, using
O(n^2*2^n) rather than O(n!) work.  2016 day 24 is nicer than 2015
days 9 and 13, in that the problem at hand really is asking about a
path or cycle pinned to node 0, so we can drop to n=7 for a mere
7*6/2*128/2 = 1344 comparisons, outright beating the 2520 comparisons
of the prior 7!/2 approach.

On my laptop, performance improves from about 260us to 240us.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant