[KYUUBI #7305] Route JDBC engine metadata calls through DatabaseMetaData#7442
Closed
wangzhigang1999 wants to merge 9 commits into
Closed
[KYUUBI #7305] Route JDBC engine metadata calls through DatabaseMetaData#7442wangzhigang1999 wants to merge 9 commits into
wangzhigang1999 wants to merge 9 commits into
Conversation
…eMetaData The JDBC engine's metadata operations previously rendered hand-written INFORMATION_SCHEMA SQL per dialect. As a result: - DBeaver / Tableau / Hive CLI saw column names and TABLE_CAT / TABLE_SCHEM layouts that did not match the JDBC spec, which broke catalog/schema discovery and forced the client into expensive fallback queries. - Each dialect duplicated SQL builder code that was hard to keep in sync with the JDBC contract. This commit routes getCatalogs / getSchemas / getTables / getColumns / getTypeInfo / getFunctions / getPrimaryKeys / getCrossReference through `DatabaseMetaData.getXxx(...)` on the backend connection and exposes the result-set unchanged. Catalog-vs-schema convention is delegated to each dialect via the existing JdbcDialect SPI: - `DatabaseTermSupport` mixin captures the common pattern shared by ClickHouse and the MySQL family (MySQL / Doris / StarRocks) where the driver exposes the database as either catalog or schema, never both. - `SchemaHelper.normalizeMetadataColumnLabel` lets dialects rewrite backend-driver labels that diverge from the JDBC spec (Apache Impala's HS2 reports `TABLE_MD` instead of `TABLE_SCHEM`). The hook lives on SchemaHelper but is only invoked on the metadata-only path so user SELECT aliases are unaffected. - `ExecuteMetaDataOperation`, `Set/GetCurrentCatalog`, `Set/GetCurrentDatabase` resolve dialect from the session's sessionConf and inject it into their closures so the metadata call and downstream rowset / label normalisation use the same dialect. - `JdbcSessionImpl` applies the URL database before running the per-session initialize SQL so DDL in the init SQL executes against the user-requested database rather than the driver's default. - Streaming metadata results go through a new `ResultSetFetchIterator` that prefetches one row to drive hasNext without consuming and closes the result set + statement idempotently. Existing OperationSuite tests across all eight dialects (MySQL, PostgreSQL, Oracle, Phoenix, ClickHouse, Doris, StarRocks, Impala) are updated for the new column layouts the JDBC spec mandates. WithExternalJdbcEngine / KYUUBI_TEST_*_URL escape hatch lets the same suites run against a long-running docker stack instead of testcontainers (unset env vars => testcontainers fallback, CI unaffected).
One <Dialect>MetadataContractSuite per backend (MySQL, PostgreSQL, Oracle, Phoenix, ClickHouse, Doris, StarRocks, Impala) that exercises the metadata ops over Thrift via the Hive JDBC client and asserts the result-set schema and key fields downstream consumers (DBeaver / Tableau / Hive CLI) actually see. Assertions include strict filter-honoring on getTables / getColumns to catch "filter param silently dropped" regressions. Shared helpers live in MetadataTestHelpers (assertJdbcSpecColumnNames, assertColumnContract, assertTableRow). Per-driver gaps documented inline where assertions are skipped (Phoenix PK / getTypeInfo / getFunctions, StarRocks getFunctions).
…readability Pull the (JdbcDialect, Connection) => ... closures out of the addOperation(...) calls in JdbcOperationManager into named vals, and introduce ExecuteMetaDataOperation.MetaDataCall type alias for the metadata variant. Rename the lambda parameter d -> dialect.
…tion Address @pan3793's review on apache#7442: handle USE_CATALOG symmetrically with USE_DATABASE at session open, so the URL-level catalog is forwarded via JdbcDialect.setCatalog. Both branches share the "default" Hive-client stub filter. Fix the PostgreSQL getSchemas assertion that still expected the old hand-written SQL path's null TABLE_CATALOG. With DatabaseMetaData passthrough the PG driver returns the real database name ("postgres").
…dbc-it OperationWithServerSuite's "phoenix - URL database is applied to backend connection" failed in CI with "No suitable driver" because the test's DriverManager.getConnection bypasses the Kyuubi server and hits the in-process embedded JDBC engine directly. The in-process engine resolves the driver from the test JVM classpath, but phoenix-queryserver-client was only copied to target/ via maven-dependency-plugin for the ENGINE_JDBC_EXTRA_CLASSPATH of the spawned subprocess engine, not added as a maven dependency. mysql-connector-j and clickhouse-jdbc are already declared as test deps - this aligns Phoenix with them.
pan3793
reviewed
May 12, 2026
pan3793
left a comment
Member
There was a problem hiding this comment.
thanks for making this big change, it overall looks good. Have you tried with DBeaver, is problem solved?
Contributor
Author
… dialects Replace the DatabaseTermSupport trait mixin with hierarchical inheritance: move dual-write setSchema/setCatalog and read-with-fallback semantics directly into MySQLDialect (inherited by Doris/StarRocks) and ClickHouseDialect. Drop the SQLFeatureNotSupportedException fallback that was never exercised in practice.
…source link Add a permalink (Impala 4.5.0 tag) to MetadataOp.java#L114 in the comment so reviewers can verify the upstream typo directly.
…ncrete classes Convert the metadata and lifecycle operations from higher-order-function constructor parameters to concrete subclasses, matching the per-operation file style of the spark/hive engines. ExecuteMetaDataOperation becomes an abstract base whose subclasses (GetTypeInfo, GetCatalogs, GetSchemas, GetTables, GetTableTypes, GetColumns, GetFunctions, GetPrimaryKeys, GetCrossReference) each fix the metadata RPC; the four lifecycle classes (SetCurrentCatalog/Database, GetCurrentCatalog/Database) likewise own the catalog/database parameter directly and call the dialect from runInternal. JdbcOperationManager's factory methods collapse to one-liners.
pan3793
approved these changes
May 12, 2026
- ImpalaDialect: simplify Statement/ResultSet lifecycle with JdbcUtils.withCloseable - Rename ExecuteMetaDataOperation -> MetaDataOperation - Rename runMetaDataCall -> fetchMetaData
Member
|
thanks, merged to master |
4 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.

Why are the changes needed?
The JDBC engine used to answer metadata requests with dialect-specific, hand-written
INFORMATION_SCHEMASQL forgetCatalogs,getSchemas,getTables,getColumns,getTypeInfo,getFunctions,getPrimaryKeys, andgetCrossReference.That path was hard to keep aligned with the JDBC contract. Clients could see swapped or missing catalog/schema columns, backend-specific labels such as Impala's
TABLE_MD, and inconsistent metadata filter handling across dialects.This PR replaces those per-dialect SQL builders with the standard JDBC
DatabaseMetaData.getXxx(...)APIs from the backend driver, while keeping catalog-vs-schema conventions and metadata-only label normalization inJdbcDialect.Concretely this is the cause of #7305 ("DBeaver returned all tables in all databases when connecting"): the default
JdbcDialectthrewfeatureNotSupportedforgetCatalogsOperation/getSchemasOperation, so DBeaver could not obtain a database list and fell back togetTables(null, null, ...), which the Hive client rewrites to schemaPattern="%". RoutinggetCatalogs/getSchemasthroughDatabaseMetaDatalets the backend driver answer them directly, so DBeaver gets a real catalog/schema list and the all-databases fallback no longer fires. This is the direction @pan3793 suggested on #7419, which this PR supersedes.What changes?
DatabaseMetaDatacalls.ResultSetFetchIteratorfor correct metadata streaming, partial fetch, and idempotent close behavior.Compatibility notes
Metadata result sets now follow the JDBC spec more closely. Clients using spec column names should benefit; clients depending on the old non-spec labels or catalog/schema layout may need updates.
When the JDBC URL carries a database,
JdbcSessionImplnow switches to it before executing the per-session initialize SQL, so init SQL runs in the URL-specified database rather than the driver's default.How was this patch tested?
OperationSuitecoverage for MySQL, PostgreSQL, Oracle, Phoenix, ClickHouse, Doris, StarRocks, and Impala.ResultSetFetchIteratorSuitefor prefetch, partial fetch, and close behavior.SELECTagainst existing tables succeeds.DROP IF EXISTS->CREATE->INSERT->SELECT->DROP) verified on StarRocks, Doris, MySQL, PostgreSQL, and ClickHouse; Oracle / Phoenix / Impala excluded since their DDL grammar diverges.Was this patch authored or co-authored using generative AI tooling?
Yes. Claude Opus 4.7 helped draft parts of the test cases and local test-environment setup. The implementation, test results, and final changes were reviewed and validated by the author.
Closes #7305. Supersedes #7419.