[BugFix] Honor per-snapshot schema for Iceberg time travel reads by GavinMar · Pull Request #74711 · StarRocks/starrocks

GavinMar · 2026-06-12T01:40:42Z

Why I'm doing:

Iceberg time travel queries (VERSION AS OF / TIMESTAMP AS OF) resolved the target snapshot only to select data files, while column resolution, the result-set metadata, the descriptor sent to BE, and scan planning all used the latest table schema. After a schema change (e.g. a column rename), a time-travel read returned the current column names instead of the schema bound to the targeted snapshot:

CREATE TABLE t (id INT, c INT);            -- snapshot S1 committed with schema (id, c)
ALTER TABLE t RENAME COLUMN c TO c_renamed;
INSERT INTO t VALUES (...);                -- snapshot S2 with schema (id, c_renamed)

SELECT * FROM t VERSION AS OF <S1>;        -- returned column `c_renamed` instead of `c`
SELECT * FROM t VERSION AS OF <S1> WHERE c = 1;  -- failed: `c` could not be resolved

This diverges from Iceberg semantics. The snapshot's schema-id is defined by the Iceberg spec as "ID of the table's current schema when the snapshot was created", and the Iceberg reference implementation (SnapshotUtil.schemaFor, DataScan#useSnapshotSchema), other query engines read snapshot-id/timestamp/tag time travel with the snapshot's schema.

What I'm doing:

Resolve the query period once during analysis and bind the snapshot schema through the whole read path:

QueryAnalyzer: after table resolution, resolve the query period to a version range (via the new QueryPeriodResolver, extracted from RelationTransformer so both layers share one implementation), pin the resolved range on the TableRelation so the transformer does not resolve it again, and, when the targeted snapshot's schema differs from the current one, replace the table with an immutable per-query copy created by IcebergTable#withReadSchema.
IcebergTable: withReadSchema rebuilds the StarRocks-side full schema from the snapshot schema; getEffectiveIcebergSchema feeds partition/bucket column resolution and toThrift (columns, iceberg_schema, partition expressions and source column names) so FE and BE see one consistent schema. Queries whose current partition spec references a column missing from the snapshot schema are rejected with a clear error.
IcebergMetadata: convert pushdown predicates against the effective (snapshot) schema, matching the schema the scan binds expressions against.
StarRocksIcebergTableScan: the scan schema is the snapshot schema for time travel (DataScan#useSnapshotSchema), while table partition specs stay bound to the current schema. Rebind specs to the scan schema (scanSpecsById) for the residual/partition/metrics evaluators, DeleteFileIndex, ManifestGroup, and manifest filtering, so filter expressions bind consistently. A spec that cannot be rebound degrades to no pruning with the full filter kept as residual.

Branch semantics: a branch reference is resolved to its head snapshot and read with that snapshot's schema, same as Trino. Spark instead reads branches with the current table schema (SnapshotUtil.schemaFor(table, ref)); snapshot-id, timestamp, and tag reads behave identically across all three engines.

Verification:

New UTs cover schema rebinding, the analyzer path (including the pre-resolved external table path), the BE descriptor, and scan planning with predicates on renamed columns.
New SQL tests (test_timetravel_snapshot_schema) cover tag/branch reads, old/new column name resolution, predicates on renamed columns, and a partitioned table whose partition source column was renamed.
Verified end to end on a real cluster against a Hive-catalog Iceberg table: time travel to the pre-rename snapshot now returns the original column name and supports filtering on it, matching pyiceberg/Spark/Trino.

What type of PR is this:

Does this PR entail a change in behavior?

Yes, this PR will result in a change in behavior.
No, this PR will not result in a change in behavior.

If yes, please specify the type of change:

Interface/UI changes: syntax, type conversion, expression evaluation, display information
Parameter changes: default values, similar parameters but with different default values
Policy changes: use new policy to replace old one, functionality automatically enabled
Feature removed
Miscellaneous: upgrade & downgrade compatibility, etc.

Checklist:

I have added test cases for my bug fix or my new feature
This pr needs user documentation (for new or modified features or behaviors)
- I have added documentation for my new feature or new function
- This pr needs auto generate documentation
This is a backport pr

Bugfix cherry-pick branch check:

I have checked the version labels which the pr will be auto-backported to the target branch
- 4.1
- 4.0
- 3.5

Time travel queries (VERSION/TIMESTAMP AS OF) resolved the target snapshot only to select data files, while column resolution, the descriptor sent to BE, and scan planning all used the latest table schema. After a schema change (e.g. a column rename), such queries returned the current column names instead of the schema bound to the targeted snapshot, diverging from Iceberg semantics. Resolve the query period once during analysis, pin the resolved version range on the table relation, and rebind the IcebergTable to the snapshot schema via an immutable per-query copy. Convert pushdown predicates against the same schema and rebind partition specs to the scan schema inside StarRocksIcebergTableScan so partition, metrics, residual, and manifest evaluators bind consistently. Signed-off-by: GavinMar <yangguansuo@starrocks.com>

github-actions · 2026-06-12T01:42:40Z

[Java-Extensions Incremental Coverage Report]

✅ pass : 0 / 0 (0%)

github-actions · 2026-06-12T01:42:42Z

[FE Incremental Coverage Report]

❌ fail : 4 / 7 (57.14%)

file detail

	path	covered_line	new_line	coverage	not_covered_line_detail
🔵	com/starrocks/catalog/IcebergTable.java	0	3	00.00%	[200, 205, 225]
🔵	com/starrocks/sql/optimizer/transformer/RelationTransformer.java	1	1	100.00%	[]
🔵	com/starrocks/sql/analyzer/QueryAnalyzer.java	1	1	100.00%	[]
🔵	com/starrocks/connector/iceberg/IcebergApiConverter.java	2	2	100.00%	[]

github-actions · 2026-06-12T01:46:35Z

[BE Incremental Coverage Report]

✅ pass : 0 / 0 (0%)

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 3e8c5c5528

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-06-12T01:47:14Z

+        for (PartitionField field : getNativeTable().spec().fields()) {
+            if (currentSchema.findColumnName(field.sourceId()) != null
+                    && schema.findColumnName(field.sourceId()) == null) {
+                throw new StarRocksConnectorException(


Don't reject older snapshots after partition evolution

For a table that was readable at an older snapshot before a later ALTER TABLE ... ADD COLUMN p/partition-spec replacement on p, this check rejects VERSION AS OF that older snapshot solely because the current partition spec references a field absent from the snapshot schema. Iceberg snapshots are planned with the specs attached to their manifests, and this change already carries per-scan specs elsewhere, so the current spec should not make historical snapshots unusable; users time-travelling to pre-evolution snapshots will now get Time travel is not supported instead of the old rows.

Useful? React with 👍 / 👎.

github-actions Bot added 4.1 4.0 3.5 labels Jun 12, 2026

mergify Bot assigned GavinMar Jun 12, 2026

github-actions Bot removed the 3.5 label Jun 12, 2026

github-actions Bot requested review from Youngwb and stephen-shelby June 12, 2026 01:42

chatgpt-codex-connector Bot reviewed Jun 12, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BugFix] Honor per-snapshot schema for Iceberg time travel reads#74711

[BugFix] Honor per-snapshot schema for Iceberg time travel reads#74711
GavinMar wants to merge 1 commit into
StarRocks:mainfrom
GavinMar:fix_ice_time_travel_schema

GavinMar commented Jun 12, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jun 12, 2026

Uh oh!

github-actions Bot commented Jun 12, 2026

Uh oh!

github-actions Bot commented Jun 12, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Jun 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

GavinMar commented Jun 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Why I'm doing:

What I'm doing:

What type of PR is this:

Checklist:

Bugfix cherry-pick branch check:

Uh oh!

github-actions Bot commented Jun 12, 2026

[Java-Extensions Incremental Coverage Report]

Uh oh!

github-actions Bot commented Jun 12, 2026

[FE Incremental Coverage Report]

file detail

Uh oh!

github-actions Bot commented Jun 12, 2026

[BE Incremental Coverage Report]

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Jun 12, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

GavinMar commented Jun 12, 2026 •

edited

Loading