Skip to content

[Enhancement] Read DATE/DATETIME footer stats in tablet pre-split meta tier#74710

Open
xiangguangyxg wants to merge 5 commits into
StarRocks:mainfrom
xiangguangyxg:meta-tier-date-datetime-window
Open

[Enhancement] Read DATE/DATETIME footer stats in tablet pre-split meta tier#74710
xiangguangyxg wants to merge 5 commits into
StarRocks:mainfrom
xiangguangyxg:meta-tier-date-datetime-window

Conversation

@xiangguangyxg

Copy link
Copy Markdown
Contributor

Why I'm doing:

The Sample-Based Tablet Pre-Split meta tier reads Parquet/ORC footer column statistics to plan tablet split boundaries cheaply, avoiding a full data-tier SELECT sub-query. Until now it only accepted integer/boolean stats, so temporal range/partition sort keys — the dominant bulk-load case — always fell through to the data tier. This extends the meta tier to DATE/DATETIME so those loads get the cheap footer-based split.

What I'm doing:

Only the two footer readers change; the pipeline below them (ParquetMetadataSampler, BoundaryPlanner, Variant/DateVariant, the providers, MetaTierFormat) is type-agnostic and untouched.

  • Parquet (ParquetRowGroupStatisticsReader): INT32 + DATE → StarRocks DATE; INT64 + TIMESTAMP with isAdjustedToUTC=false (MILLIS/MICROS/NANOS) → DATETIME.
  • ORC (OrcStripeStatisticsReader): DATE category → StarRocks DATE.
  • All gated to the value window [1970-01-01, 9999-12-31] via the new MetaTierTemporalWindow. Anything outside it — plus UTC-adjusted/INT96 Parquet timestamps, ORC TIMESTAMP/TIMESTAMP_INSTANT, DECIMAL, and FLOAT/DOUBLE — throws MetaTierUnavailableException and falls back to the data tier (never a wrong boundary, never a load failure).

Why the window: for non-negative ticks the FE Math.floorDiv/floorMod conversion is bit-identical to the BE load's signed division (Int64ToDateTimeConverter), so the FE-computed boundary equals the value the load stores; the window also avoids the yyyy-formatter year-0 mis-render and the pre-1582 proleptic/hybrid-calendar parity question. Boundary strings round-trip through DateVariant.getStringValue (canonical yyyy-MM-dd / yyyy-MM-dd HH:mm:ss[.ffffff]).

Tests: per-reader DATE/DATETIME success (incl. window edges and NANOS→micros truncation), UTC-adjusted / out-of-window / type-mismatch / all-null fallbacks, ORC TIMESTAMP deferral, and a ParquetMetadataSampler date-quantile integration test that pins the type-agnostic-pipeline claim.

What type of PR is this:

  • BugFix
  • Feature
  • Enhancement
  • Refactor
  • UT
  • Doc
  • Tool

Does this PR entail a change in behavior?

  • Yes, this PR will result in a change in behavior.
  • No, this PR will not result in a change in behavior.

If yes, please specify the type of change:

  • Interface/UI changes: syntax, type conversion, expression evaluation, display information
  • Parameter changes: default values, similar parameters but with different default values
  • Policy changes: use new policy to replace old one, functionality automatically enabled
  • Feature removed
  • Miscellaneous: upgrade & downgrade compatibility, etc.

Checklist:

  • I have added test cases for my bug fix or my new feature
  • This pr needs user documentation (for new or modified features or behaviors)
    • I have added documentation for my new feature or new function
    • This pr needs auto generate documentation
  • This is a backport pr

Bugfix cherry-pick branch check:

  • I have checked the version labels which the pr will be auto-backported to the target branch
    • 4.1
    • 4.0
    • 3.5

🤖 Generated with Claude Code

xiangguangyxg and others added 5 commits June 11, 2026 16:17
…indow

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…/DATETIME

- Extract the format-agnostic [1970-01-01, 9999-12-31] safe-window gate into
  MetaTierTemporalWindow (called by both footer readers), removing per-reader duplication.
- Replace the duplicated 4-way integer-type checks in the Parquet INT32/INT64 arms with
  PrimitiveType.isIntegerType(), matching the ORC reader.
- Name the ORC stripe-conversion helpers as verbs (convertIntegerStripe/convertDateStripe)
  and expand stripeInfo -> stripeInformation.
- Wrap convertDateStripe's Variant.of inside the RuntimeException -> MetaTierUnavailableException
  guard so an unrepresentable date value falls back to data tier instead of escaping read().
- Fix the stale ORC type-window javadoc, note the intentional checked-exception bypass in the
  Parquet convertBlock catch, move the duplicated statusOf test helper into PresplitTestSupport,
  and drop a redundant DATE->BIGINT fallback test.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@CelerData-Reviewer

Copy link
Copy Markdown

@codex review

@github-actions github-actions Bot requested review from meegoo and wyb June 12, 2026 01:23
@chatgpt-codex-connector

Copy link
Copy Markdown

Codex Review: Didn't find any major issues. Bravo.

Reviewed commit: aabe9acbef

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@github-actions

Copy link
Copy Markdown
Contributor

[Java-Extensions Incremental Coverage Report]

pass : 0 / 0 (0%)

@github-actions

Copy link
Copy Markdown
Contributor

[FE Incremental Coverage Report]

pass : 60 / 64 (93.75%)

file detail

path covered_line new_line coverage not_covered_line_detail
🔵 com/starrocks/alter/reshard/presplit/OrcStripeStatisticsReader.java 19 22 86.36% [219, 220, 222]
🔵 com/starrocks/alter/reshard/presplit/ParquetRowGroupStatisticsReader.java 36 37 97.30% [270]
🔵 com/starrocks/alter/reshard/presplit/MetaTierTemporalWindow.java 5 5 100.00% []

@github-actions

Copy link
Copy Markdown
Contributor

[BE Incremental Coverage Report]

pass : 0 / 0 (0%)

@xiangguangyxg xiangguangyxg requested a review from kevincai June 12, 2026 04:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants