Skip to content

SchemaView performance improvements#9431

Draft
rschili wants to merge 16 commits into
masterfrom
rschili/schema-view-fragment
Draft

SchemaView performance improvements#9431
rschili wants to merge 16 commits into
masterfrom
rschili/schema-view-fragment

Conversation

@rschili

@rschili rschili commented Jun 20, 2026

Copy link
Copy Markdown
Contributor

imodel-native: iTwin/imodel-native#1479

Fixes: #9430

What

SchemaView was built and tuned against a very large iModel, balanced for the general mix of use cases.
Presentation has been testing it for adoption and the results are promising in almost every scenario, but
one critical scenario regresses on very large iModels - enough that they cannot adopt SchemaView as-is.

The scenario

  • Very large iModel (~30 GB, ~100 schemas, hundreds of thousands of properties).
  • Open the iModel and walk the model tree as fast as possible.

That path only needs BisCore, yet SchemaView paid to hydrate every schema in the file before returning. This PR makes SchemaView pay only for what the caller asks for, and removes a fixed cost that hurt every consumer.

Two changes, independent but shipped together.

Change 1: a cheaper schema-identity token (PRAGMA schema_token)

SchemaView uses a token to detect when an iModel's schemas have changed so it can drop its cached view.
It was obtaining that token from PRAGMA checksum(ecdb_schema), a SHA3 hash over the full contents of all schema tables (every class, property, and custom-attribute instance).

On the large iModel above, that checksum alone takes ~1.7 seconds, almost entirely CPU-bound on the hash - while loading BisCore plus its references via the binary blob is only ~30 ms. The token, meant to be a cheap "did anything change?" check, was dominating the whole operation, and every SchemaView consumer paid it regardless of how much schema they needed.

This PR adds PRAGMA schema_token, which hashes schema identity only - the name and version of each schema (one row per schema in ec_Schema, ordered by name). It is essentially free. The same cheap hash now also backs the schemaToken column returned by PRAGMA schema_view and the new PRAGMA schema_view_fragment, so a cached view and any later token check derive from the same value.

Measured on the ~30 GB / ~100-schema file: the token computation dropped from ~1744 ms to 1 ms, and the schema fragment pragma (see Change 2 below) from ~1759 ms to ~14 ms (cold ≈ warm, confirming the cost was CPU, not disk).

Known limitation (accepted): because the token hashes name + version only, it does not detect a schema whose contents change without a version bump. ECDb only allows in-place re-import for dynamic schemas, so that is the only case affected; We accept this for now and can strengthen the hash later (for example a cached per-schema content checksum) without changing the pragma's contract.

Frontend IModelConnection.invalidateSchemaViewIfChanged now queries PRAGMA schema_token (reading .token) instead of PRAGMA checksum(ecdb_schema) (.sha3_256).

Change 2: getSchemaView({ schemas }) - load only a subset

getSchemaView() gains an optional argument. It is purely additive:

// Unchanged: loads every schema in the iModel using an optimized binary blob, exactly as before.
const full = await iModel.getSchemaView();

// New: ensure only BisCore and its references are loaded.
const view = await iModel.getSchemaView({ schemas: ["BisCore"] });
view.findClass("BisCore.Subject");        // present
view.findClass("Generic.PhysicalObject"); // undefined - not loaded

Behavior:

  • The subset view is a accumulating instance. A later call with different schemas merges
    their closure into the same view, so schemas loaded earlier stay available.
  • If the requested schemas (or all schemas) are already loaded, the call is a synchronous no-op that returns the existing view.
  • Inside SchemaView: A schema that is not loaded looks identical to a schema the iModel does not contain: findClass and friends return undefined. Cross-schema walks (derivedClasses, etc.) are complete only over what is currently loaded.
  • Schema names the iModel does not contain are ignored.

How it works

Manifest (cheap reference graph). On first subset request, the backend reads the schema reference graph from ECDbMeta (meta.ECSchemaDef + meta.SchemaHasSchemaReferences) - just names, versions, ids, and reference edges, no schema data. This is exposed as the new SchemaManifest type.

This has one incoming path via getSchemaView:

  • If caller specified no filter, and nothing is loaded yet -> load the full blob.
  • If filter specified OR (already some schemas loaded + no filter) -> calculate which schemas need to be loaded and load them as a fragment
  • If no additional schemas need to be loaded, just return the object.

Serialized merges. Overlapping concurrent requests are serialized behind one in-flight promise and re-check the loaded-set inside the continuation, so two requests can never double-merge a schema.

Fixed Primitive Type enum width

The binary blob incorrectly used 8 bits for the primitive type, but we need 16 bits for it. I decided to fix this format in-place. The API is still beta, and the addon + backend are coupled together, so both should always speak the same language.

A point could be made that this warrants a v2 binary blob format - in the future such changes probably should, however, since no real consumer on frontend picked this up yet, and we will backport the fix to 5.10, I believe it's safe to fix this in place. It's a fix for the v1 format, not an evolution of it.

rschili added 5 commits June 20, 2026 00:15
…older

Relocate SchemaView.ts, SchemaViewBinaryReader.ts and SchemaViewInterfaces.ts
into a dedicated src/SchemaView/ folder and fix the relative import paths in the
moved files and their importers (barrel, SchemaLocalization, test). Pure move,
no behavior change - prepares the package for the incremental schema-loading work.
- Introduced `PRAGMA schema_view_fragment` to return a subset of schemas as a binary blob, enabling incremental loading.
- Updated documentation for `Pragmas.md`, `SchemaView.md`, and `SchemaViewBinaryFormat.md` to reflect the new pragma and its usage.
- Enhanced `getSchemaView` method to support loading only specified schemas and their dependencies, improving performance for large iModels.
- Added tests for schema view fragment loading, ensuring correct behavior when loading subsets of schemas and handling dependencies.
- Implemented `SchemaManifest` to manage schema references and loading order, facilitating efficient schema management.
…pragma

- Added support for incremental schema loading in IModelDb using the new schema_view_fragment pragma.
- Updated the SchemaView class to handle schema tokens for cache invalidation.
- Modified the getSchemaView method to utilize the new incremental loading strategy.
- Enhanced documentation for schema_view and schema_token pragmas to clarify their usage and benefits.
- Updated tests to reflect changes in schema view lifecycle and cache invalidation logic.
@grigasp

grigasp commented Jun 29, 2026

Copy link
Copy Markdown
Member

This PR adds PRAGMA schema_token, which hashes schema identity only - the name and version of each schema

Known limitation (accepted): because the token hashes name + version only, it does not detect a schema whose contents change without a version bump. ECDb only allows in-place re-import for dynamic schemas, so that is the only case affected; We accept this for now and can strengthen the hash later (for example a cached per-schema content checksum) without changing the pragma's contract.

How does this work with schema editing? Would editing a schema automatically increase the schema version?

@aruniverse aruniverse added this to the iTwin.js 5.12 milestone Jun 29, 2026
@rschili rschili changed the title WIP SchemaView performance improvements SchemaView performance improvements Jun 30, 2026
@rschili

rschili commented Jun 30, 2026

Copy link
Copy Markdown
Contributor Author

@grigasp

How does this work with schema editing? Would editing a schema automatically increase the schema version?

Not automatically. When you import an edited schema, you have to increment the version yourself or ECDb will consider the schema unchanged. It's pretty safe to assume this works for all normal scenarios, with the only exception of dynamic schemas.

Our rules already state how callers should increment schema versions. Any change in non-dynamic schemas needs at least a minor digit increment. Frankly, we loosened the rules for dynamic schemas a few years ago, and I now believe that may have been a mistake. They incremented the minor version automatically very often and were worried it would eventually overflow.

If this is a problem, we can add a short term fix to force invalidate the schemaView whenever dynamic schemas are involved, but feels like beyond the scope of this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

SchemaView performance improvements on large iModels

3 participants