SchemaView performance improvements#9431
Conversation
…older Relocate SchemaView.ts, SchemaViewBinaryReader.ts and SchemaViewInterfaces.ts into a dedicated src/SchemaView/ folder and fix the relative import paths in the moved files and their importers (barrel, SchemaLocalization, test). Pure move, no behavior change - prepares the package for the incremental schema-loading work.
- Introduced `PRAGMA schema_view_fragment` to return a subset of schemas as a binary blob, enabling incremental loading. - Updated documentation for `Pragmas.md`, `SchemaView.md`, and `SchemaViewBinaryFormat.md` to reflect the new pragma and its usage. - Enhanced `getSchemaView` method to support loading only specified schemas and their dependencies, improving performance for large iModels. - Added tests for schema view fragment loading, ensuring correct behavior when loading subsets of schemas and handling dependencies. - Implemented `SchemaManifest` to manage schema references and loading order, facilitating efficient schema management.
…pragma - Added support for incremental schema loading in IModelDb using the new schema_view_fragment pragma. - Updated the SchemaView class to handle schema tokens for cache invalidation. - Modified the getSchemaView method to utilize the new incremental loading strategy. - Enhanced documentation for schema_view and schema_token pragmas to clarify their usage and benefits. - Updated tests to reflect changes in schema view lifecycle and cache invalidation logic.
…chema_token) for accuracy
…clarify failure conditions
How does this work with schema editing? Would editing a schema automatically increase the schema version? |
Not automatically. When you import an edited schema, you have to increment the version yourself or ECDb will consider the schema unchanged. It's pretty safe to assume this works for all normal scenarios, with the only exception of dynamic schemas. Our rules already state how callers should increment schema versions. Any change in non-dynamic schemas needs at least a minor digit increment. Frankly, we loosened the rules for dynamic schemas a few years ago, and I now believe that may have been a mistake. They incremented the minor version automatically very often and were worried it would eventually overflow. If this is a problem, we can add a short term fix to force invalidate the schemaView whenever dynamic schemas are involved, but feels like beyond the scope of this PR. |
imodel-native: iTwin/imodel-native#1479
Fixes: #9430
What
SchemaViewwas built and tuned against a very large iModel, balanced for the general mix of use cases.Presentation has been testing it for adoption and the results are promising in almost every scenario, but
one critical scenario regresses on very large iModels - enough that they cannot adopt
SchemaViewas-is.The scenario
That path only needs BisCore, yet
SchemaViewpaid to hydrate every schema in the file before returning. This PR makesSchemaViewpay only for what the caller asks for, and removes a fixed cost that hurt every consumer.Two changes, independent but shipped together.
Change 1: a cheaper schema-identity token (
PRAGMA schema_token)SchemaViewuses a token to detect when an iModel's schemas have changed so it can drop its cached view.It was obtaining that token from
PRAGMA checksum(ecdb_schema), a SHA3 hash over the full contents of all schema tables (every class, property, and custom-attribute instance).On the large iModel above, that checksum alone takes ~1.7 seconds, almost entirely CPU-bound on the hash - while loading BisCore plus its references via the binary blob is only ~30 ms. The token, meant to be a cheap "did anything change?" check, was dominating the whole operation, and every
SchemaViewconsumer paid it regardless of how much schema they needed.This PR adds
PRAGMA schema_token, which hashes schema identity only - the name and version of each schema (one row per schema inec_Schema, ordered by name). It is essentially free. The same cheap hash now also backs theschemaTokencolumn returned byPRAGMA schema_viewand the newPRAGMA schema_view_fragment, so a cached view and any later token check derive from the same value.Measured on the ~30 GB / ~100-schema file: the token computation dropped from ~1744 ms to 1 ms, and the schema fragment pragma (see Change 2 below) from ~1759 ms to ~14 ms (cold ≈ warm, confirming the cost was CPU, not disk).
Known limitation (accepted): because the token hashes name + version only, it does not detect a schema whose contents change without a version bump. ECDb only allows in-place re-import for dynamic schemas, so that is the only case affected; We accept this for now and can strengthen the hash later (for example a cached per-schema content checksum) without changing the pragma's contract.
Frontend
IModelConnection.invalidateSchemaViewIfChangednow queriesPRAGMA schema_token(reading.token) instead ofPRAGMA checksum(ecdb_schema)(.sha3_256).Change 2:
getSchemaView({ schemas })- load only a subsetgetSchemaView()gains an optional argument. It is purely additive:Behavior:
their closure into the same view, so schemas loaded earlier stay available.
findClassand friends returnundefined. Cross-schema walks (derivedClasses, etc.) are complete only over what is currently loaded.How it works
Manifest (cheap reference graph). On first subset request, the backend reads the schema reference graph from ECDbMeta (
meta.ECSchemaDef+meta.SchemaHasSchemaReferences) - just names, versions, ids, and reference edges, no schema data. This is exposed as the newSchemaManifesttype.This has one incoming path via
getSchemaView:Serialized merges. Overlapping concurrent requests are serialized behind one in-flight promise and re-check the loaded-set inside the continuation, so two requests can never double-merge a schema.
Fixed Primitive Type enum width
The binary blob incorrectly used 8 bits for the primitive type, but we need 16 bits for it. I decided to fix this format in-place. The API is still beta, and the addon + backend are coupled together, so both should always speak the same language.
A point could be made that this warrants a v2 binary blob format - in the future such changes probably should, however, since no real consumer on frontend picked this up yet, and we will backport the fix to 5.10, I believe it's safe to fix this in place. It's a fix for the v1 format, not an evolution of it.