[KYUUBI #7305] Route JDBC engine metadata calls through DatabaseMetaData by wangzhigang1999 · Pull Request #7442 · apache/kyuubi

wangzhigang1999 · 2026-05-11T09:40:12Z

Why are the changes needed?

The JDBC engine used to answer metadata requests with dialect-specific, hand-written INFORMATION_SCHEMA SQL for getCatalogs, getSchemas, getTables, getColumns, getTypeInfo, getFunctions, getPrimaryKeys, and getCrossReference.

That path was hard to keep aligned with the JDBC contract. Clients could see swapped or missing catalog/schema columns, backend-specific labels such as Impala's TABLE_MD, and inconsistent metadata filter handling across dialects.

This PR replaces those per-dialect SQL builders with the standard JDBC DatabaseMetaData.getXxx(...) APIs from the backend driver, while keeping catalog-vs-schema conventions and metadata-only label normalization in JdbcDialect.

Concretely this is the cause of #7305 ("DBeaver returned all tables in all databases when connecting"): the default JdbcDialect threw featureNotSupported for getCatalogsOperation / getSchemasOperation, so DBeaver could not obtain a database list and fell back to getTables(null, null, ...), which the Hive client rewrites to schemaPattern="%". Routing getCatalogs / getSchemas through DatabaseMetaData lets the backend driver answer them directly, so DBeaver gets a real catalog/schema list and the all-databases fallback no longer fires. This is the direction @pan3793 suggested on #7419, which this PR supersedes.

What changes?

Replace the hand-written metadata SQL path with backend DatabaseMetaData calls.
Add dialect support for database-term handling and metadata column-label normalization.
Pass the resolved session dialect consistently through metadata operations.
Apply the URL-level database before running per-session initialize SQL.
Add ResultSetFetchIterator for correct metadata streaming, partial fetch, and idempotent close behavior.

Compatibility notes

Metadata result sets now follow the JDBC spec more closely. Clients using spec column names should benefit; clients depending on the old non-spec labels or catalog/schema layout may need updates.

When the JDBC URL carries a database, JdbcSessionImpl now switches to it before executing the per-session initialize SQL, so init SQL runs in the URL-specified database rather than the driver's default.

How was this patch tested?

Updated existing per-dialect OperationSuite coverage for MySQL, PostgreSQL, Oracle, Phoenix, ClickHouse, Doris, StarRocks, and Impala.
Added backend metadata contract suites that verify the Hive JDBC client over Thrift sees JDBC-spec column names, key fields, and metadata filters.
Added ResultSetFetchIteratorSuite for prefetch, partial fetch, and close behavior.
Verified end-to-end via beeline against all eight dialects (MySQL, PostgreSQL, Oracle, Phoenix, ClickHouse, Doris, StarRocks, Impala): metadata operations return JDBC-spec column layouts and SELECT against existing tables succeeds.
Full DDL/DML cycles (DROP IF EXISTS -> CREATE -> INSERT -> SELECT -> DROP) verified on StarRocks, Doris, MySQL, PostgreSQL, and ClickHouse; Oracle / Phoenix / Impala excluded since their DDL grammar diverges.

Was this patch authored or co-authored using generative AI tooling?

Yes. Claude Opus 4.7 helped draft parts of the test cases and local test-environment setup. The implementation, test results, and final changes were reviewed and validated by the author.

Closes #7305. Supersedes #7419.

…eMetaData The JDBC engine's metadata operations previously rendered hand-written INFORMATION_SCHEMA SQL per dialect. As a result: - DBeaver / Tableau / Hive CLI saw column names and TABLE_CAT / TABLE_SCHEM layouts that did not match the JDBC spec, which broke catalog/schema discovery and forced the client into expensive fallback queries. - Each dialect duplicated SQL builder code that was hard to keep in sync with the JDBC contract. This commit routes getCatalogs / getSchemas / getTables / getColumns / getTypeInfo / getFunctions / getPrimaryKeys / getCrossReference through `DatabaseMetaData.getXxx(...)` on the backend connection and exposes the result-set unchanged. Catalog-vs-schema convention is delegated to each dialect via the existing JdbcDialect SPI: - `DatabaseTermSupport` mixin captures the common pattern shared by ClickHouse and the MySQL family (MySQL / Doris / StarRocks) where the driver exposes the database as either catalog or schema, never both. - `SchemaHelper.normalizeMetadataColumnLabel` lets dialects rewrite backend-driver labels that diverge from the JDBC spec (Apache Impala's HS2 reports `TABLE_MD` instead of `TABLE_SCHEM`). The hook lives on SchemaHelper but is only invoked on the metadata-only path so user SELECT aliases are unaffected. - `ExecuteMetaDataOperation`, `Set/GetCurrentCatalog`, `Set/GetCurrentDatabase` resolve dialect from the session's sessionConf and inject it into their closures so the metadata call and downstream rowset / label normalisation use the same dialect. - `JdbcSessionImpl` applies the URL database before running the per-session initialize SQL so DDL in the init SQL executes against the user-requested database rather than the driver's default. - Streaming metadata results go through a new `ResultSetFetchIterator` that prefetches one row to drive hasNext without consuming and closes the result set + statement idempotently. Existing OperationSuite tests across all eight dialects (MySQL, PostgreSQL, Oracle, Phoenix, ClickHouse, Doris, StarRocks, Impala) are updated for the new column layouts the JDBC spec mandates. WithExternalJdbcEngine / KYUUBI_TEST_*_URL escape hatch lets the same suites run against a long-running docker stack instead of testcontainers (unset env vars => testcontainers fallback, CI unaffected).

One <Dialect>MetadataContractSuite per backend (MySQL, PostgreSQL, Oracle, Phoenix, ClickHouse, Doris, StarRocks, Impala) that exercises the metadata ops over Thrift via the Hive JDBC client and asserts the result-set schema and key fields downstream consumers (DBeaver / Tableau / Hive CLI) actually see. Assertions include strict filter-honoring on getTables / getColumns to catch "filter param silently dropped" regressions. Shared helpers live in MetadataTestHelpers (assertJdbcSpecColumnNames, assertColumnContract, assertTableRow). Per-driver gaps documented inline where assertions are skipped (Phoenix PK / getTypeInfo / getFunctions, StarRocks getFunctions).

…readability Pull the (JdbcDialect, Connection) => ... closures out of the addOperation(...) calls in JdbcOperationManager into named vals, and introduce ExecuteMetaDataOperation.MetaDataCall type alias for the metadata variant. Rename the lambda parameter d -> dialect.

@pan3793

…tion Address @pan3793's review on apache#7442: handle USE_CATALOG symmetrically with USE_DATABASE at session open, so the URL-level catalog is forwarded via JdbcDialect.setCatalog. Both branches share the "default" Hive-client stub filter. Fix the PostgreSQL getSchemas assertion that still expected the old hand-written SQL path's null TABLE_CATALOG. With DatabaseMetaData passthrough the PG driver returns the real database name ("postgres").

…dbc-it OperationWithServerSuite's "phoenix - URL database is applied to backend connection" failed in CI with "No suitable driver" because the test's DriverManager.getConnection bypasses the Kyuubi server and hits the in-process embedded JDBC engine directly. The in-process engine resolves the driver from the test JVM classpath, but phoenix-queryserver-client was only copied to target/ via maven-dependency-plugin for the ENGINE_JDBC_EXTRA_CLASSPATH of the spawned subprocess engine, not added as a maven dependency. mysql-connector-j and clickhouse-jdbc are already declared as test deps - this aligns Phoenix with them.

pan3793

thanks for making this big change, it overall looks good. Have you tried with DBeaver, is problem solved?

wangzhigang1999 · 2026-05-12T11:40:03Z

thanks for making this big change, it overall looks good. Have you tried with DBeaver, is problem solved?

Verified end-to-end on Impala 4.5.0 — DBeaver via Kyuubi shows the full column tree:

… dialects Replace the DatabaseTermSupport trait mixin with hierarchical inheritance: move dual-write setSchema/setCatalog and read-with-fallback semantics directly into MySQLDialect (inherited by Doris/StarRocks) and ClickHouseDialect. Drop the SQLFeatureNotSupportedException fallback that was never exercised in practice.

…source link Add a permalink (Impala 4.5.0 tag) to MetadataOp.java#L114 in the comment so reviewers can verify the upstream typo directly.

…ncrete classes Convert the metadata and lifecycle operations from higher-order-function constructor parameters to concrete subclasses, matching the per-operation file style of the spark/hive engines. ExecuteMetaDataOperation becomes an abstract base whose subclasses (GetTypeInfo, GetCatalogs, GetSchemas, GetTables, GetTableTypes, GetColumns, GetFunctions, GetPrimaryKeys, GetCrossReference) each fix the metadata RPC; the four lifecycle classes (SetCurrentCatalog/Database, GetCurrentCatalog/Database) likewise own the catalog/database parameter directly and call the dialect from runInternal. JdbcOperationManager's factory methods collapse to one-liners.

pan3793

LGTM, a few nits.

- ImpalaDialect: simplify Statement/ResultSet lifecycle with JdbcUtils.withCloseable - Rename ExecuteMetaDataOperation -> MetaDataOperation - Rename runMetaDataCall -> fetchMetaData

pan3793 · 2026-05-13T03:46:04Z

thanks, merged to master

wangzhigang1999 added 3 commits May 11, 2026 14:59

github-actions Bot added the module:jdbc label May 11, 2026

wangzhigang1999 changed the title ~~[KYUUBI #7305] Route JDBC engine metadata calls through DatabaseMetaData~~ [KYUUBI #7305] Route JDBC engine metadata calls through DatabaseMetaData May 11, 2026

pan3793 reviewed May 11, 2026

View reviewed changes

Comment thread ...yuubi-jdbc-engine/src/main/scala/org/apache/kyuubi/engine/jdbc/session/JdbcSessionImpl.scala

wangzhigang1999 marked this pull request as ready for review May 11, 2026 13:38

github-actions Bot added kind:build module:integration-tests labels May 12, 2026

pan3793 reviewed May 12, 2026

View reviewed changes

wangzhigang1999 added 3 commits May 12, 2026 20:42

[KYUUBI apache#7305] Polish ImpalaSchemaHelper comment with upstream …

94883e4

…source link Add a permalink (Impala 4.5.0 tag) to MetadataOp.java#L114 in the comment so reviewers can verify the upstream typo directly.

pan3793 approved these changes May 12, 2026

View reviewed changes

[KYUUBI apache#7305] Address review nits

7e6d727

- ImpalaDialect: simplify Statement/ResultSet lifecycle with JdbcUtils.withCloseable - Rename ExecuteMetaDataOperation -> MetaDataOperation - Rename runMetaDataCall -> fetchMetaData

pan3793 assigned wangzhigang1999 May 13, 2026

pan3793 added this to the v1.12.0 milestone May 13, 2026

pan3793 closed this in 98f81f2 May 13, 2026

wangzhigang1999 mentioned this pull request May 13, 2026

[FEATURE] Add AGENTS.md to guide AI coding agents contributing to Kyuubi #7445

Closed

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[KYUUBI #7305] Route JDBC engine metadata calls through DatabaseMetaData#7442

[KYUUBI #7305] Route JDBC engine metadata calls through DatabaseMetaData#7442
wangzhigang1999 wants to merge 9 commits into
apache:masterfrom
wangzhigang1999:KYUUBI-7305

wangzhigang1999 commented May 11, 2026 •

edited

Loading

Uh oh!

Uh oh!

pan3793 left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

wangzhigang1999 commented May 12, 2026

Uh oh!

pan3793 left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

pan3793 commented May 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

wangzhigang1999 commented May 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Why are the changes needed?

What changes?

Compatibility notes

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

Uh oh!

pan3793 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

wangzhigang1999 commented May 12, 2026

Uh oh!

pan3793 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

pan3793 commented May 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

wangzhigang1999 commented May 11, 2026 •

edited

Loading