Quadratic sieve tuning#2712
Merged
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary:
We add a tuning program for the quadratic sieve (
qsieve/tune/tune-qsieve). This takes a specific bit size as input, starts with the current tuning parameters for this bit size, and does a local search for improvements. The tuning program was coded with Claude.To simplify the tuning implementation, we add the function
qsieve_factor_with_tunewhich takes tuning parameters as input (qsieve_factoris just a wrapper around this which looks up default parameters).Update the default tuning table up to 160 bits with parameters found with
tune-qsieve. The old tuning values were highly suboptimal; the new ones essentially speed upfmpz_factorby a factor two below 128 bits.Another minor speedup to
qsieve_factorfor small factorisations: avoiding an unnecessarymemset.Some notes about the tuning process:
In practice, it seems that there are many local optima, and the tuning program will typically arrive at very different final parameters when started from very different initial values. Finding global optima could be quite expensive as there are five independent parameters. Fortunately, different local optima seem to be pretty close to each other in performance. You may note that the new tuning table has a sharp discontinuity in values at the crossover point between the old and new tuning values (160 bits), but either set of parameters yields essentially the same performance (within 10%) at the crossover point.
I didn't continue tuning for larger bit sizes since this starts to take a lot of time. Also, from this size on further tuning should probably distinguish between single-threaded and multi-threaded use (the current values seem OK for either).
Performance benchmark, time to factor N random semiprimes of the given bit size with
fmpz_factor: