feat(starrocks): translate FE plan fragments over BRPC PInternalService#941
Conversation
Address review findings on the compute-node BRPC PInternalService path: - Confine per-connection read/decode/write errors to the connection and keep the server running, instead of letting one bad frame propagate out of serve_* and crash the whole compute node. - Spawn a task per accepted connection (tonic-style) so a slow or long-lived peer cannot block the accept loop; reap finished tasks in the accept loop and drain in-flight tasks on shutdown. - Generate the BRPC service facade with quote/prettyplease instead of string concatenation, emit Send futures behind a Send+Sync service trait, and name method constants via heck. - PRPC transport: return the distinct ENOMETHOD code for unknown methods, drop the dead attachment_size assignment, and grow the body buffer in bounded chunks rather than pre-allocating the declared size. - Document that an OK status from exec_plan_fragment means accepted and translated, not executed. Add a regression test that a malformed frame closes only its connection while the server keeps serving. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The starrocks CN build script compiles proto definitions from the apache/brpc submodule, so CI must check it out alongside the starrocks submodule or fmt/clippy/test fail before the build script runs. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- Replace the hand-rolled startup registration retry loop with backon's exponential backoff (min 1s, capped at 30s, bounded by the configured attempt count). - Type registration_max_attempts as NonZeroU32 so clap rejects 0 and the runtime guard is no longer needed. - Instrument the registration, reporting, BRPC connection, and plan fragment paths with tracing::instrument and emit span close events (FmtSpan::CLOSE) so per-operation timings are visible. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- Drive the heartbeat/backend/BRPC server joins through a JoinSet so wait_until_shutdown observes the first exit, stops everything once, and drains the rest in one loop, removing the per-branch repetition. - Add Frame::into_request to move the request body/attachment into the service request instead of cloning them, plus a correlation_id getter and a correlation-id-based response_frame builder. - Derive Copy on FrameSizes. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Remove the serve/serve_with_shutdown/serve_with_listener variants from both BrpcServer and the inner BrpcServiceServer; only new, bind, and serve_with_listener_shutdown are used. Also drop a stale comment reference to PR notes. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
mike-wendt
left a comment
There was a problem hiding this comment.
Approved for changes in Ops CODEOWNERS files
| let shutdown = CancellationToken::new(); | ||
| let server_shutdown = shutdown.clone(); | ||
| let join = tokio::task::spawn_blocking(move || { | ||
| let runtime = tokio::runtime::Builder::new_current_thread() |
There was a problem hiding this comment.
Is it only a single-thread runtime here ? do we have plans to switch to anything else later ?
There was a problem hiding this comment.
Correct. We should revisit this later.
| ) -> std::result::Result<(), String> { | ||
| let translated = self | ||
| .translator | ||
| .translate_fragment(params) |
There was a problem hiding this comment.
translate_fragment is synchronous right ? If yes, and runs on that single BRPC worker, is it expected to stay cheap, or is spawn_blocking / a multi-thread runtime planned once fragment execution lands?
There was a problem hiding this comment.
Yes - this will be updated once fragment execution is added.
| peer: SocketAddr, | ||
| shutdown: CancellationToken, | ||
| ) { | ||
| loop { |
There was a problem hiding this comment.
Is my following understanding correct: Requests on one connection are handled serially even though BRPC allows multiplexing concurrent requests, and does the FE actually pipeline fragments over one connection ?
There was a problem hiding this comment.
Yes, correct. We can add per-request concurrency in a follow-up.
|
my bad, for a second I thought it was quent 😄 and hence made that merge comment |
Summary
Adds the StarRocks compute-node BRPC
PInternalServicepath so a Sirius CN can receive plan fragments dispatched by a StarRocks FE and translate them. Part of #826.exec_plan_fragmentandexec_batch_plan_fragments: deserialize the binary-thriftTExecPlanFragmentParamsattachment and run it through a thrift→SubstraitPlanTranslator.PRPCenvelope — FE→backend RPC is not gRPC/HTTP2) and a Tower-based BRPC server with tonic-like graceful shutdown.build.rs; vendorapache/brpcas a submodule for the upstream proto definitions.Notes
experimental/starrocks. CN registration and FE heartbeat were already wired; this adds the fragment-handling RPC surface on top.