Skip to content

CBO-aware routing between native scans and JDBC catalogs #74613

@SankalpBaliarsingh

Description

@SankalpBaliarsingh

Feature request

Is your feature request related to a problem? Please describe.

  1. Most OLAP tables are fed from an OLTP source (e.g. via CDC).

  2. OLTP is optimized for point queries on indexed columns; OLAP is optimized for scans.

  3. This proposal compares the cost of fetching rows via an OLAP scan vs. OLTP point lookups, and picks the cheaper path.

  4. It helps two patterns:

    Pointed queries — small-result lookups on indexed columns, served from the OLTP source instead of paying OLAP scan/shuffle cost.

    Time-windowed joins with a long tail — e.g.

        SELECT o.*, p.amount, p.created_on
        FROM orders o
        LEFT JOIN payment p
                ON p.order_id = o.order_id
                AND p.created_on BETWEEN o.created_on AND o.created_on + INTERVAL N DAY
        WHERE o.created_on BETWEEN '2026-06-01' AND '2026-07-01';
    • In practice a payment settles within minutes of its order, so ~99% of matches fall in a narrow window; ~1% scatter beyond it (retries, manual settlement, month-boundary stragglers).
    • To avoid missing the tail, users widen the right-side window (e.g. a 1-month buffer). This scans extra partitions on every query and ignores that 99% sit in the narrow window.
    • It also inflates join state: with no knowledge of the relationship, the join holds the full window in memory until matching completes, rather than evicting once each key is matched.

Describe the solution you'd like

Extend the JDBC external catalog with cost-aware routing, in stages.

Scope: initial target is 1:1 relationships (e.g. order ↔ payment), where in-window completeness is decidable — an outer row with zero in-window matches is the enrichment candidate (one match means done).

Precondition: the right table is time-partitioned on the join's range column; otherwise there are no extra partitions to avoid and the temporal optimization doesn't apply.

  1. Per-OLAP-table mapping spec — declare the OLTP source table and a per-column transformation:

    • Identity-mapped — same column both sides (order_id).
    • Derived — OLAP column = expression over OLTP columns (e.g. created_date = date(created_on AT TIME ZONE 'IST'));
    • OLAP-only — no OLTP source (operation_type, flink_processing_time, ingestion timestamps).

    Routing to OLTP is legal only when every referenced column — select list, filters, and join keys — is mappable.

  2. JDBC-side cost model — estimates OLTP fetch cost and compares it to the OLAP scan cost.

    • Index check — whether the OLTP source has a usable index on the lookup columns.
    • Rows to enrich — how many rows / keys the OLTP fetch would touch (the tail size).
    • OLAP-side cost — extra partitions the native scan would read (reuses existing StarRocks costing).

    Decision: route to OLTP when cost(OLTP fetch) < cost(extra OLAP scan).

  3. Point-query routing — CBO compares native scan vs OLTP point lookup for queries on indexed columns and routes accordingly.

  4. Tiered join with tail recovery (1:1)

    • 1:1 cardinality lets the join evict each build-side key after its single match — saves memory.
    • Query supplies the narrow window to join; BE/CN runs the narrow-window join.
    • Outer rows with zero in-window match are buffered and resolved via keyed-IN point lookup to the OLTP source.

    The window is user-supplied; the routing decision (scan more partitions vs. OLTP recovery) is the CBO's, via the cost model — not a blind user directive.

Describe alternatives you've considered

  • Only using Hint
    • This works but unsafe without a cost model behind it.
    • Eg: An analyst tagging 50% of rows for OLTP fallback would overwhelm production PG.
    • SREs have no control when a query goes to PG.
  • Wider time window — Current scenario. It increases OLAP scan cost.

Additional context

  • I want to know if this CBO-driven routing idea will be allowed in the project?
  • Let me know any prior discussion or related work I should reference.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions