feat(starrocks): infer FILES() parquet schema in the compute node#962
Open
mbrobbel wants to merge 1 commit into
Open
feat(starrocks): infer FILES() parquet schema in the compute node#962mbrobbel wants to merge 1 commit into
mbrobbel wants to merge 1 commit into
Conversation
Implement the PInternalService get_file_schema BRPC handler so the FE can
resolve the schema of SELECT * FROM FILES('...parquet') against the Rust
compute node, instead of failing because no backend answered the schema
proxy request.
The handler decodes the binary-thrift TGetFileSchemaRequest attachment,
reads the parquet footer asynchronously, and maps its top-level columns to
StarRocks slot descriptors. Type mapping mirrors the native scanner so the
inferred schema is one the FILE_SCAN path can actually read:
- integers map by physical width (INT32 -> INT, INT64 -> BIGINT)
- DECIMAL32/64/128 by precision; wider precision falls back to VARCHAR
- raw BYTE_ARRAY -> VARBINARY, FIXED_LEN_BYTE_ARRAY -> VARCHAR
- DATE/TIME/TIMESTAMP, JSON, BSON handled via logical and legacy converted types
Inputs outside the supported surface are rejected with a clear error rather
than resolving a schema the scanner would later choke on: nested columns,
multi-file ranges, case-insensitive duplicate names, remote URI schemes, and
non-local file authorities.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Implement the
PInternalServiceget_file_schemaBRPC handler so the FE can resolve the schema ofSELECT * FROM FILES('...parquet')against the Rust compute node, instead of failing because no backend answered the schema proxy request.The handler decodes the binary-thrift
TGetFileSchemaRequestattachment, opens the parquet file, and maps its top-level columns to StarRocks slot descriptors. Type mapping mirrors the native scanner so the inferred schema is one theFILE_SCANpath can actually read:Inputs outside the supported surface are rejected with a clear error rather than resolving a schema the scanner would later choke on: nested columns, multi-file ranges, case-insensitive duplicate names, remote URI schemes, and non-local file authorities. Schema reading uses the parquet crate's native metadata reader (no arrow dependency).