Fix deadlock between concurrent query worker and main thread#9436
Fix deadlock between concurrent query worker and main thread#9436khanaffan wants to merge 10 commits into
Conversation
- Updated performance test for ECSqlReader to include new query options. - Adjusted expected prepare time in ConcurrentQuery tests for accuracy. - Added new performance test configuration for ECSqlReader with multiple classes. - Created new performance results documentation for ECSqlReader concurrent queries.
| "perftest:schemaloader": "npm run -s perftest:pre && mocha --timeout=999999999 \"./lib/cjs/perftest/SchemaLoader.test.js\"", | ||
| "perftest:ecSqlRowPerformance": "npm run -s perftest:pre && mocha --timeout=999999999 \"./lib/cjs/perftest/ECSqlRow.test.js\"", | ||
| "perftest:readQueryPerformance": "npm run -s perftest:pre && mocha --timeout=999999999 \"./lib/cjs/perftest/ReadQueryPerf.test.js\"", | ||
| "perftest:ecSqlReaderConcurrent": "npm run -s perftest:pre && mocha --timeout=999999999 \"./lib/cjs/perftest/ECSqlReaderConcurrentPerf.test.js\"", |
There was a problem hiding this comment.
Do we want to run them as part of some pipeline as well? Also, do we want to generate an artifact to record the performance numbers so that we can see how performance number change in future?
There was a problem hiding this comment.
Yes. It be great to track performance.
|
The PR description is stale and still describes the old approach, you should update it. The PR bundles several independent things, could have been split, but I approved, with respect to the still pending comments from others. |
| const median = runs[Math.floor(runs.length / 2)]; | ||
| const minQps = runs[0].throughputQps; | ||
| const maxQps = runs[runs.length - 1].throughputQps; | ||
| totalErrors.push(...median.errorSamples); |
There was a problem hiding this comment.
This only asserts errors from the median run after sorting by throughput. If a non-median repeat has query failures, the test can still pass as long as the median repeat is clean, which makes the harness look more reliable than it is.
Can we aggregate errors from every timed repeat separately from choosing the median result for perf reporting?
— Nambot 🤖 (powered by GPT-5.5)
There was a problem hiding this comment.
This is a real bug. totalErrors.push(...median.errorSamples) collects error samples from only the median repeat after the runs.sort(). If any other repeat has query errors, runWorkload captures them in its errorSamples, but they're never pushed to totalErrors, so the assert.strictEqual(totalErrors.length, 0) at line 422 can pass despite errors in non-median runs. The assert's message ("all generated ECSQL statements should execute without error") is then misleading.
Suggest aggregating from all runs:
for (const r of runs)
totalErrors.push(...r.errorSamples);This is a perf harness so it won't block merge, but worth a fix before someone trusts the zero-error pass.
|
|
||
| // A single seed iModel populated with every configured class so cross-class, polymorphic and | ||
| // join queries all return data. | ||
| const fileName = `ECSqlReaderConcurrentPerf_seed_${config.elementsPerClass}x${config.classNames.length}.bim`; |
There was a problem hiding this comment.
This seed cache key is too weak. It only includes element count and class count, so changing classNames, schema assumptions, or seed-building logic with the same counts will silently reuse an old .bim via the early return below.
Can we include the configured class names/config identity in the seed filename, or validate the existing seed before reusing it?
— Nambot 🤖 (powered by GPT-5.5)
There was a problem hiding this comment.
Valid point — changing classNames while keeping the same count would silently reuse a stale seed. A simple fix is to include the names in the filename: classNames.join("-") or a short hash of the config. Low risk for a manually-run tool where the developer controls the output directory, but worth noting as a footgun if the config evolves. Can be a follow-up.
aruniverse
left a comment
There was a problem hiding this comment.
Reviewed the TypeScript-side changes. All good to merge once the native package from imodel-native#1463 lands.
ECSqlStatement.test.ts — Moving PRAGMA validate_ecsql_writes=true from createQueryReader to withWriteStatement is the right fix. The pragma must run on the primary write connection; via CCQ it silently no-ops on a throwaway worker connection. The added comment explains this clearly (see inline).
ECDb.test.ts — Trailing comma only. The runDbListPragmaCCQ behavior (pragmas executing on the schema-source/data-source connection) is validated on the native side in ECSqlPragmasTests.cpp.
ConcurrentQuery.test.ts — closeTo(0, 4) tolerance is fine.
Issue #9451 — Properly scoped. The fix here (using the right API) is correct; making CCQ actively reject write-affecting PRAGMAs with an error is a separate infrastructure improvement. Good call filing it explicitly.
Changelog — "type": "none" is accurate; the behavior fix lives in the native PR and this PR contains only test corrections.
Perf test in CI (naveedkhan8067's question) — worth a follow-up ticket to wire perftest:ecSqlReaderConcurrent into a pipeline and capture baseline artifacts, but not a blocker here.
…adlocks and improve connection handling
The deadlock fix prepares the first query against a cold shared schema-source connection, so prepareTime is higher and more variable on CI (observed 7ms against a 0 +/- 4ms tolerance). Assert only that prepareTime is negligible relative to the ~1000ms execution instead of pinning a tight tolerance. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
aruniverse
left a comment
There was a problem hiding this comment.
Reviewed commits 41626a7bab and f844e0b39f.
doNotUsePrimaryConnToPrepare deprecation (41626a7bab) — well done. The @deprecated in 5.11.0 JSDoc in ConcurrentQuery.ts is clear, the core-common.api.md snapshot is updated, both core-backend and core-common have changenotes, and the flag is removed from the test config. ✅
prepareTime bound (f844e0b39f) — lessThan(100) is lenient but the comment explains exactly why (cold schema-source connection, observed 7ms against a 0 +/- 4 bound). The assertion still has value as a sanity guard against catastrophically slow prepares, and the narrative explanation in the comment is honest. ✅
One issue on the perf harness (see inline on line 382): totalErrors.push(...median.errorSamples) only collects errors from the median repeat — errors in other repeats are silently dropped before the assert.strictEqual(totalErrors.length, 0) check. Worth a one-liner fix before relying on that assertion for correctness gating.
imodel-native: iTwin/imodel-native#1463
Fix uses a dedicated connection that shared with all workers to prepare ECSQL statement. Previously we used primary connection that caused the dead lock. Performance test were added to see if we degraded or improved performance of concurrent query. From below it is clear that 4 thread which is default we improved by 1.1% (queries/sec)
ECSqlReader concurrent-query performance: A vs B (final)
A = original addon (
9cd010a6) · B = new addon with serialization fix (4742fee6).Throughput in queries/sec (higher = better), mean of 2 runs/phase; range spans maxConc 16 and 64.
All runs: 0 errors,
rows=847828.