feat(starrocks): translate FE plan fragments over BRPC PInternalService by mbrobbel · Pull Request #941 · sirius-db/sirius

mbrobbel · 2026-06-15T13:13:24Z

Summary

Adds the StarRocks compute-node BRPC PInternalService path so a Sirius CN can receive plan fragments dispatched by a StarRocks FE and translate them. Part of #826.

Implement exec_plan_fragment and exec_batch_plan_fragments: deserialize the binary-thrift TExecPlanFragmentParams attachment and run it through a thrift→Substrait PlanTranslator.
Add a Baidu PRPC frame transport (raw-TCP PRPC envelope — FE→backend RPC is not gRPC/HTTP2) and a Tower-based BRPC server with tonic-like graceful shutdown.
Generate BRPC and StarRocks protobuf bindings in build.rs; vendor apache/brpc as a submodule for the upstream proto definitions.

Notes

Translation currently validates the fragment and logs the resulting plan; fragment execution is out of scope for this PR.
All changes are under experimental/starrocks. CN registration and FE heartbeat were already wired; this adds the fragment-handling RPC surface on top.

Address review findings on the compute-node BRPC PInternalService path: - Confine per-connection read/decode/write errors to the connection and keep the server running, instead of letting one bad frame propagate out of serve_* and crash the whole compute node. - Spawn a task per accepted connection (tonic-style) so a slow or long-lived peer cannot block the accept loop; reap finished tasks in the accept loop and drain in-flight tasks on shutdown. - Generate the BRPC service facade with quote/prettyplease instead of string concatenation, emit Send futures behind a Send+Sync service trait, and name method constants via heck. - PRPC transport: return the distinct ENOMETHOD code for unknown methods, drop the dead attachment_size assignment, and grow the body buffer in bounded chunks rather than pre-allocating the declared size. - Document that an OK status from exec_plan_fragment means accepted and translated, not executed. Add a regression test that a malformed frame closes only its connection while the server keeps serving. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

The starrocks CN build script compiles proto definitions from the apache/brpc submodule, so CI must check it out alongside the starrocks submodule or fmt/clippy/test fail before the build script runs. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

- Replace the hand-rolled startup registration retry loop with backon's exponential backoff (min 1s, capped at 30s, bounded by the configured attempt count). - Type registration_max_attempts as NonZeroU32 so clap rejects 0 and the runtime guard is no longer needed. - Instrument the registration, reporting, BRPC connection, and plan fragment paths with tracing::instrument and emit span close events (FmtSpan::CLOSE) so per-operation timings are visible. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

- Drive the heartbeat/backend/BRPC server joins through a JoinSet so wait_until_shutdown observes the first exit, stops everything once, and drains the rest in one loop, removing the per-branch repetition. - Add Frame::into_request to move the request body/attachment into the service request instead of cloning them, plus a correlation_id getter and a correlation-id-based response_frame builder. - Derive Copy on FrameSizes. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Remove the serve/serve_with_shutdown/serve_with_listener variants from both BrpcServer and the inner BrpcServiceServer; only new, bind, and serve_with_listener_shutdown are used. Also drop a stale comment reference to PR notes. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

mike-wendt

Approved for changes in Ops CODEOWNERS files

9prady9 · 2026-06-16T09:45:39Z

+        let shutdown = CancellationToken::new();
+        let server_shutdown = shutdown.clone();
+        let join = tokio::task::spawn_blocking(move || {
+            let runtime = tokio::runtime::Builder::new_current_thread()


Is it only a single-thread runtime here ? do we have plans to switch to anything else later ?

Correct. We should revisit this later.

9prady9 · 2026-06-16T09:45:39Z

+    ) -> std::result::Result<(), String> {
+        let translated = self
+            .translator
+            .translate_fragment(params)


translate_fragment is synchronous right ? If yes, and runs on that single BRPC worker, is it expected to stay cheap, or is spawn_blocking / a multi-thread runtime planned once fragment execution lands?

Yes - this will be updated once fragment execution is added.

9prady9 · 2026-06-16T09:45:40Z

+        peer: SocketAddr,
+        shutdown: CancellationToken,
+    ) {
+        loop {


Is my following understanding correct: Requests on one connection are handled serially even though BRPC allows multiplexing concurrent requests, and does the FE actually pipeline fragments over one connection ?

Yes, correct. We can add per-request concurrency in a follow-up.

9prady9

/merge

9prady9 · 2026-06-16T14:05:10Z

my bad, for a second I thought it was quent 😄 and hence made that merge comment

mbrobbel added 10 commits June 10, 2026 18:36

Add StarRocks CN BRPC plan translation

d100e12

Document BRPC service layers

88dc648

Make StarRocks BRPC service trait async

90ad0d7

Use upstream BRPC proto definitions

915e1b8

Clean up StarRocks CN shutdown logging

19ec666

Use heck for BRPC method variants

bfefebf

Make BRPC server join consume handle

7728da6

Document StarRocks CN fields and localize helpers

18d0ee4

Make BRPC server shutdown tonic-like

90cd347

Rename BRPC service to compute node service

6aae8ef

mbrobbel added enhancement New feature or request starrocks labels Jun 15, 2026

mbrobbel marked this pull request as draft June 15, 2026 13:14

mbrobbel and others added 6 commits June 15, 2026 13:29

Merge remote-tracking branch 'upstream/dev' into starrocks-translate

e969624

mbrobbel marked this pull request as ready for review June 15, 2026 18:40

mbrobbel requested a review from mike-wendt as a code owner June 15, 2026 18:40

mike-wendt approved these changes Jun 15, 2026

View reviewed changes

dhruv9vats self-requested a review June 16, 2026 10:06

9prady9 reviewed Jun 16, 2026

View reviewed changes

9prady9 approved these changes Jun 16, 2026

View reviewed changes

mbrobbel added this pull request to the merge queue Jun 16, 2026

Merged via the queue into sirius-db:dev with commit f5b0388 Jun 16, 2026
12 checks passed

mbrobbel deleted the starrocks-translate branch June 16, 2026 14:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(starrocks): translate FE plan fragments over BRPC PInternalService#941

feat(starrocks): translate FE plan fragments over BRPC PInternalService#941
mbrobbel merged 16 commits into
sirius-db:devfrom
mbrobbel:starrocks-translate

mbrobbel commented Jun 15, 2026

Uh oh!

mike-wendt left a comment

Uh oh!

9prady9 Jun 16, 2026

Uh oh!

mbrobbel Jun 16, 2026

Uh oh!

9prady9 Jun 16, 2026

Uh oh!

mbrobbel Jun 16, 2026

Uh oh!

9prady9 Jun 16, 2026

Uh oh!

mbrobbel Jun 16, 2026

Uh oh!

9prady9 left a comment

Uh oh!

9prady9 commented Jun 16, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

mbrobbel commented Jun 15, 2026

Summary

Notes

Uh oh!

mike-wendt left a comment

Choose a reason for hiding this comment

Uh oh!

9prady9 Jun 16, 2026

Choose a reason for hiding this comment

Uh oh!

mbrobbel Jun 16, 2026

Choose a reason for hiding this comment

Uh oh!

9prady9 Jun 16, 2026

Choose a reason for hiding this comment

Uh oh!

mbrobbel Jun 16, 2026

Choose a reason for hiding this comment

Uh oh!

9prady9 Jun 16, 2026

Choose a reason for hiding this comment

Uh oh!

mbrobbel Jun 16, 2026

Choose a reason for hiding this comment

Uh oh!

9prady9 left a comment

Choose a reason for hiding this comment

Uh oh!

9prady9 commented Jun 16, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants