[QF_S Benchmark] seq vs nseq on 299 QF_S benchmarks (ostrich.zip), c3 branch #9793
Closed
Replies: 1 comment
-
|
This discussion has been marked as outdated by QF_S String Solver Benchmark. A newer discussion is available at Discussion #9802. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Date: 2026-06-09
Branch: c3 · Commit:
3c39fc4Workflow Run: #27208521350
Files benchmarked: 299 (from
tests/ostrich.zip, 5 s timeout, Z3 internal timeout 4 s)Summary
Both solvers agree on every file where both return a definite result.
seqsolves 10 more files overall;nseqhas 17 more timeouts but returnsunknownon 0 files (vs. 7 forseq).Performance Comparison
seq-fast / nseq-slow — 23 files (seq < 2 s, nseq timed out)
Also:
cyclic-xy,failedProp,failedProp2,indexof_const_index_sat/unsat,indexof_var_sat/unsat,norn-benchmark-9f,str-leq11/12,str-lt,substr_var_sat,substr_const_len_unsat.nseq-fast / seq-slow — 9 files (nseq < 2 s, seq timed out)
Correctness
Disagreements (sat vs unsat): 0 ✅
nseq is strictly more complete on 5 files where seq returns
unknown:Both timeout (9 files):
artur-unsat-we,artur-unsat,nikolai-unsat,noodles-unsat,noodles-unsat2,noodles-unsat5,noodles-unsat6,noodles-unsat9,substring-bug2.seq Trace Analysis
The
seqsolver uses iterative deepening: it alternatesincrease-depth(unfolding string axioms) andincrease-length(growing bounds on string variables) until solved or refuted.var_9length 2 to find a model.xto length 4 at depth 3 to prove unsat.The
nseq(ZIPT-based) solver uses a fundamentally different algorithm; its failures on the above cases suggest incomplete handling ofstr.to_int,str.from_int, ordering predicates (str.<=,str.<), Parikh constraints, and certain concat patterns.To reproduce: build Z3 from
c3and runz3 smt.string_solver=seq|nseq -T:10 <file.smt2>. Benchmarks fromtests/ostrich.zip(299 files).Beta Was this translation helpful? Give feedback.
All reactions