Fix deadlock between concurrent query worker and main thread by khanaffan · Pull Request #9436 · iTwin/itwinjs-core

khanaffan · 2026-06-23T15:23:20Z

Fix uses a dedicated connection that shared with all workers to prepare ECSQL statement. Previously we used primary connection that caused the dead lock. Performance test were added to see if we degraded or improved performance of concurrent query. From below it is clear that 4 thread which is default we improved by 1.1% (queries/sec)

ECSqlReader concurrent-query performance: A vs B (final)

A = original addon (9cd010a6) · B = new addon with serialization fix (4742fee6).
Throughput in queries/sec (higher = better), mean of 2 runs/phase; range spans maxConc 16 and 64.
All runs: 0 errors, rows=847828.

workers	A	B (fixed)	Δ
1	182–183	182–184	~0%
2	307–313	306–309	≤1.4%
4	410–427	415–426	≤1.1%
8	228–239	230–238	≤1.1%

- Updated performance test for ECSqlReader to include new query options. - Adjusted expected prepare time in ConcurrentQuery tests for accuracy. - Added new performance test configuration for ECSqlReader with multiple classes. - Created new performance results documentation for ECSqlReader concurrent queries.

naveedkhan8067 · 2026-06-24T10:15:27Z

    "perftest:schemaloader": "npm run -s perftest:pre && mocha --timeout=999999999 \"./lib/cjs/perftest/SchemaLoader.test.js\"",
    "perftest:ecSqlRowPerformance": "npm run -s perftest:pre && mocha --timeout=999999999 \"./lib/cjs/perftest/ECSqlRow.test.js\"",
    "perftest:readQueryPerformance": "npm run -s perftest:pre && mocha --timeout=999999999 \"./lib/cjs/perftest/ReadQueryPerf.test.js\"",
+    "perftest:ecSqlReaderConcurrent": "npm run -s perftest:pre && mocha --timeout=999999999 \"./lib/cjs/perftest/ECSqlReaderConcurrentPerf.test.js\"",


Do we want to run them as part of some pipeline as well? Also, do we want to generate an artifact to record the performance numbers so that we can see how performance number change in future?

Yes. It be great to track performance.

rschili · 2026-06-25T10:36:53Z

The PR description is stale and still describes the old approach, you should update it.

The PR bundles several independent things, could have been split, but I approved, with respect to the still pending comments from others.

…ance

… in tests

hl662 · 2026-06-29T18:01:21Z

+            const median = runs[Math.floor(runs.length / 2)];
+            const minQps = runs[0].throughputQps;
+            const maxQps = runs[runs.length - 1].throughputQps;
+            totalErrors.push(...median.errorSamples);


This only asserts errors from the median run after sorting by throughput. If a non-median repeat has query failures, the test can still pass as long as the median repeat is clean, which makes the harness look more reliable than it is.

Can we aggregate errors from every timed repeat separately from choosing the median result for perf reporting?

— Nambot 🤖 (powered by GPT-5.5)

This is a real bug. totalErrors.push(...median.errorSamples) collects error samples from only the median repeat after the runs.sort(). If any other repeat has query errors, runWorkload captures them in its errorSamples, but they're never pushed to totalErrors, so the assert.strictEqual(totalErrors.length, 0) at line 422 can pass despite errors in non-median runs. The assert's message ("all generated ECSQL statements should execute without error") is then misleading.

Suggest aggregating from all runs:

for (const r of runs) totalErrors.push(...r.errorSamples);

This is a perf harness so it won't block merge, but worth a fix before someone trusts the zero-error pass.

hl662 · 2026-06-29T18:01:21Z

+
+    // A single seed iModel populated with every configured class so cross-class, polymorphic and
+    // join queries all return data.
+    const fileName = `ECSqlReaderConcurrentPerf_seed_${config.elementsPerClass}x${config.classNames.length}.bim`;


This seed cache key is too weak. It only includes element count and class count, so changing classNames, schema assumptions, or seed-building logic with the same counts will silently reuse an old .bim via the early return below.

Can we include the configured class names/config identity in the seed filename, or validate the existing seed before reusing it?

— Nambot 🤖 (powered by GPT-5.5)

Valid point — changing classNames while keeping the same count would silently reuse a stale seed. A simple fix is to include the names in the filename: classNames.join("-") or a short hash of the config. Low risk for a manually-run tool where the developer controls the output directory, but worth noting as a footgun if the config evolves. Can be a follow-up.

aruniverse

Reviewed the TypeScript-side changes. All good to merge once the native package from imodel-native#1463 lands.

ECSqlStatement.test.ts — Moving PRAGMA validate_ecsql_writes=true from createQueryReader to withWriteStatement is the right fix. The pragma must run on the primary write connection; via CCQ it silently no-ops on a throwaway worker connection. The added comment explains this clearly (see inline).

ECDb.test.ts — Trailing comma only. The runDbListPragmaCCQ behavior (pragmas executing on the schema-source/data-source connection) is validated on the native side in ECSqlPragmasTests.cpp.

ConcurrentQuery.test.ts — closeTo(0, 4) tolerance is fine.

Issue #9451 — Properly scoped. The fix here (using the right API) is correct; making CCQ actively reject write-affecting PRAGMAs with an error is a separate infrastructure improvement. Good call filing it explicitly.

Changelog — "type": "none" is accurate; the behavior fix lives in the native PR and this PR contains only test corrections.

Perf test in CI (naveedkhan8067's question) — worth a follow-up ticket to wire perftest:ecSqlReaderConcurrent into a pipeline and capture baseline artifacts, but not a blocker here.

…adlocks and improve connection handling

The deadlock fix prepares the first query against a cold shared schema-source connection, so prepareTime is higher and more variable on CI (observed 7ms against a 0 +/- 4ms tolerance). Assert only that prepareTime is negligible relative to the ~1000ms execution instead of pinning a tight tolerance. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

aruniverse

Reviewed commits 41626a7bab and f844e0b39f.

doNotUsePrimaryConnToPrepare deprecation (41626a7bab) — well done. The @deprecated in 5.11.0 JSDoc in ConcurrentQuery.ts is clear, the core-common.api.md snapshot is updated, both core-backend and core-common have changenotes, and the flag is removed from the test config. ✅

prepareTime bound (f844e0b39f) — lessThan(100) is lenient but the comment explains exactly why (cold schema-source connection, observed 7ms against a 0 +/- 4 bound). The assertion still has value as a sanity guard against catastrophically slow prepares, and the narrative explanation in the comment is honest. ✅

One issue on the perf harness (see inline on line 382): totalErrors.push(...median.errorSamples) only collects errors from the median repeat — errors in other repeats are silently dropped before the assert.strictEqual(totalErrors.length, 0) check. Worth a one-liner fix before relying on that assertion for correctness gating.

khanaffan added 3 commits June 23, 2026 10:54

Add performance test for concurrent query in @itwin/core-backend

b71fdce

Remove ECSqlReader concurrent-query performance results file

a10bfa5

khanaffan requested review from a team as code owners June 23, 2026 15:23

khanaffan requested review from calebmshafer, dassaf4 and naveedkhan8067 June 23, 2026 15:23

khanaffan mentioned this pull request Jun 23, 2026

Fix deadlock between concurrent query worker and main thread iTwin/imodel-native#1463

Open

Merge branch 'master' into affank/os-deadlock

8600245

naveedkhan8067 reviewed Jun 24, 2026

View reviewed changes

MichaelSwigerAtBentley reviewed Jun 24, 2026

View reviewed changes

Comment thread core/backend/src/test/ecdb/ECDb.test.ts Outdated

Comment thread core/backend/src/test/ecdb/ConcurrentQuery.test.ts Outdated

rschili approved these changes Jun 25, 2026

View reviewed changes

khanaffan and others added 4 commits June 25, 2026 13:34

Refactor query reader usage in tests for improved clarity and perform…

48e10c2

…ance

Enable ECSql write-value validation for detecting invalid relClassIds…

75dddd1

… in tests

Merge branch 'master' into affank/os-deadlock

cd1e3a0

Merge branch 'master' into affank/os-deadlock

578a726

hl662 reviewed Jun 29, 2026

View reviewed changes

aruniverse added this to the iTwin.js 5.11 milestone Jun 29, 2026

aruniverse reviewed Jun 29, 2026

View reviewed changes

Comment thread core/backend/src/test/ecdb/ECSqlStatement.test.ts

khanaffan and others added 2 commits June 29, 2026 15:28

Deprecate doNotUsePrimaryConnToPrepare in DbQueryConfig to avoid de…

41626a7

…adlocks and improve connection handling

khanaffan requested a review from a team as a code owner June 29, 2026 19:59

aruniverse reviewed Jun 29, 2026

View reviewed changes

rschili approved these changes Jun 30, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix deadlock between concurrent query worker and main thread#9436

Fix deadlock between concurrent query worker and main thread#9436
khanaffan wants to merge 10 commits into
masterfrom
affank/os-deadlock

khanaffan commented Jun 23, 2026 •

edited

Loading

Uh oh!

naveedkhan8067 Jun 24, 2026

Uh oh!

khanaffan Jun 24, 2026

Uh oh!

Uh oh!

Uh oh!

rschili commented Jun 25, 2026

Uh oh!

hl662 Jun 29, 2026

Uh oh!

aruniverse Jun 29, 2026

Uh oh!

Uh oh!

hl662 Jun 29, 2026

Uh oh!

aruniverse Jun 29, 2026

Uh oh!

aruniverse left a comment

Uh oh!

Uh oh!

aruniverse left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Uh oh!

Conversation

khanaffan commented Jun 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

ECSqlReader concurrent-query performance: A vs B (final)

Uh oh!

naveedkhan8067 Jun 24, 2026

Choose a reason for hiding this comment

Uh oh!

khanaffan Jun 24, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

rschili commented Jun 25, 2026

Uh oh!

hl662 Jun 29, 2026

Choose a reason for hiding this comment

Uh oh!

aruniverse Jun 29, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

hl662 Jun 29, 2026

Choose a reason for hiding this comment

Uh oh!

aruniverse Jun 29, 2026

Choose a reason for hiding this comment

Uh oh!

aruniverse left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

aruniverse left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

khanaffan commented Jun 23, 2026 •

edited

Loading