Skip to content

Use Chinese Remainder Theorem rather than sieving#82

Closed
ebblake wants to merge 1 commit into
maneatingape:mainfrom
ebblake:2020-13
Closed

Use Chinese Remainder Theorem rather than sieving#82
ebblake wants to merge 1 commit into
maneatingape:mainfrom
ebblake:2020-13

Conversation

@ebblake

@ebblake ebblake commented Jun 5, 2026

Copy link
Copy Markdown
Contributor

Description

Sieving takes, on average, O(sum(factors)) iterations. Back in 2016 day 15, where there are only 7 primes all less than 100 (and their LCM still fits in 32 bits), this is fast. But for 2020 day 13, there are 9 primes with two over 400 (with LCM around 50 bits). That means sieving will take around 1000 divisions. Better is to just use CRT directly, implementing extended Euclidean division to compute coefficients, and modular multiplication to utilize u128 intermediates to avoid overflows, at which point the answer can be obtained with fewer hardware divisions.

On my laptop, this speeds up part2 runtime from 3.6us to 0.4us.

Type of change

  • Performance improvement
  • Bug fix
  • Other

Checklist

  • Pull request title and commit messages are clear and informative.
  • Documentation has been updated if necessary.
  • Code style matches the existing code. This one is somewhat subjective, but try to "fit in" by
    using the same naming conventions. Code should be portable, avoiding any
    architecture-specific intrinsics.
  • Tests pass cargo test
  • Code is formatted cargo fmt -- `find . -name "*.rs"`
  • Code is linted cargo clippy --all-targets --all-features

Formatting and linting also can be executed by running just
(if installed) on the command line at the project root.

Sieving takes, on average, O(sum(factors)) iterations. Back in 2016
day 15, where there are only 7 primes all less than 100 (and their LCM
still fits in 32 bits), this is fast. But for 2020 day 13, there are 9
primes with two over 400 (with LCM around 50 bits).  That means
sieving will take around 1000 divisions.  Better is to just use CRT
directly, implementing extended Euclidean division to compute
coefficients, and modular multiplication to utilize u128 intermediates
to avoid overflows, at which point the answer can be obtained with
fewer hardware divisions.

On my laptop, this speeds up part2 runtime from 3.6us to 0.4us.

@maneatingape maneatingape left a comment

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Solution is already < 1µs, so the extra complexity of the CRT is not worth the tradeoff.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants