Phase 5: Distributed Optimization

Overview

Phase 5 introduced high-level optimizations to reduce network latency and enable complex multi-shard query patterns.

Key Components

1. Shard Pruning (`distributed/distributed_executor.cpp`)

Optimized query routing based on partitioning keys.

Predicate Analysis: Detects filters on sharding keys (e.g., WHERE id = 100).
Targeted Dispatch: Routes fragments only to the specific node owning the shard, avoiding cluster-wide broadcasts.

2. Aggregation Merging

Implemented coordination for distributed analytics.

Partial Aggregation: Data nodes compute local counts and sums.
Global Merge: The coordinator identifies aggregate functions in the SELECT list and merges partial results from all shards into a final result set.

3. Broadcast Join Orchestration

Developed a prototype for cross-shard JOINs.

Table Fetching: Coordinator retrieves full data from a smaller table across all shards.
Broadcasting: Pushes the gathered data to the ShuffleBuffer of every node in the cluster.
Local Execution: Rewrites the query so each node joins its local shard with the broadcasted buffer data.

4. Shuffle Infrastructure

Enabled inter-node data movement.

BufferScanOperator: A physical operator that reads from in-memory shuffle buffers instead of heap files.
ClusterManager Buffering: Thread-safe staging area for data received via PushData RPCs.

Lessons Learned

Broadcast joins are highly effective for small-to-large table joins but require careful consideration of coordinator memory limits.
Merging aggregates at the coordinator is a bottleneck for very large clusters; future work could explore tree-based merging.

Status: 100% Test Pass

All scenarios, including distributed transactions (2PC) and join orchestration, have been verified with automated integration tests.

Phase 5 Completion: QueryExecutor Integration

The vectorized execution engine was wired into QueryExecutor via set_parallel(true) mode, enabling SELECT queries to optionally use the vectorized batch path:

QueryExecutor::set_parallel(true) — enables vectorized batch execution
QueryExecutor::set_storage_manager() — provides StorageManager for ColumnarTable lookups
build_vectorized_plan() — constructs operator tree (Scan → Filter → HashJoin → GroupBy → Project)
execute_select() — branches on use_vectorized flag between Volcano (tuple) and vectorized (batch) paths
Join type support: VectorizedHashJoinOperator supports INNER, LEFT, RIGHT, and FULL outer joins via JoinType enum. RIGHT and FULL outer joins use right_matched_ bitmap and emit_unmatched_right_rows() to emit unmatched right rows at end of probe.
Constraint: Sort/Limit queries fall back to Volcano path since SortOperator/LimitOperator don't inherit from VectorizedOperator

Phase 5 Extension: Cost-Based Volcano/Vectorized Chooser (PR #75)

The flag-based chooser (parallel_ && storage_manager_ && !has_sort_or_limit) was replaced with a cost-based heuristic using per-table statistics:

ANALYZE TABLE — single-pass scan collects min/max/NDV/null_count stats stored in ColumnInfo (catalog)
RowEstimator (optimizer/row_estimator.cpp) — row count estimation from column statistics
kVectorizedRowThreshold = 10000 — heuristic: Vectorized batch execution outperforms Volcano above ~10k rows
Chooser guard — checks ExprType::Column to skip estimation for JOINs, subqueries, and aliased tables (Volcano fallback handles these correctly)
Text NDV — 64-char prefix truncation limits memory; note that long shared prefixes are underestimated

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Phase 5: Distributed Optimization

Overview

Key Components

1. Shard Pruning (`distributed/distributed_executor.cpp`)

2. Aggregation Merging

3. Broadcast Join Orchestration

4. Shuffle Infrastructure

Lessons Learned

Status: 100% Test Pass

Phase 5 Completion: QueryExecutor Integration

Phase 5 Extension: Cost-Based Volcano/Vectorized Chooser (PR #75)

Status: 100% Test Pass

FilesExpand file tree

PHASE_5_OPTIMIZATION.md

Latest commit

History

PHASE_5_OPTIMIZATION.md

File metadata and controls

Phase 5: Distributed Optimization

Overview

Key Components

1. Shard Pruning (distributed/distributed_executor.cpp)

2. Aggregation Merging

3. Broadcast Join Orchestration

4. Shuffle Infrastructure

Lessons Learned

Status: 100% Test Pass

Phase 5 Completion: QueryExecutor Integration

Phase 5 Extension: Cost-Based Volcano/Vectorized Chooser (PR #75)

Status: 100% Test Pass

1. Shard Pruning (`distributed/distributed_executor.cpp`)