Skip to content

[KYUUBI #7305] Route JDBC engine metadata calls through DatabaseMetaData#7442

Closed
wangzhigang1999 wants to merge 9 commits into
apache:masterfrom
wangzhigang1999:KYUUBI-7305
Closed

[KYUUBI #7305] Route JDBC engine metadata calls through DatabaseMetaData#7442
wangzhigang1999 wants to merge 9 commits into
apache:masterfrom
wangzhigang1999:KYUUBI-7305

Conversation

@wangzhigang1999

@wangzhigang1999 wangzhigang1999 commented May 11, 2026

Copy link
Copy Markdown
Contributor

Why are the changes needed?

The JDBC engine used to answer metadata requests with dialect-specific, hand-written INFORMATION_SCHEMA SQL for getCatalogs, getSchemas, getTables, getColumns, getTypeInfo, getFunctions, getPrimaryKeys, and getCrossReference.

That path was hard to keep aligned with the JDBC contract. Clients could see swapped or missing catalog/schema columns, backend-specific labels such as Impala's TABLE_MD, and inconsistent metadata filter handling across dialects.

This PR replaces those per-dialect SQL builders with the standard JDBC DatabaseMetaData.getXxx(...) APIs from the backend driver, while keeping catalog-vs-schema conventions and metadata-only label normalization in JdbcDialect.

Concretely this is the cause of #7305 ("DBeaver returned all tables in all databases when connecting"): the default JdbcDialect threw featureNotSupported for getCatalogsOperation / getSchemasOperation, so DBeaver could not obtain a database list and fell back to getTables(null, null, ...), which the Hive client rewrites to schemaPattern="%". Routing getCatalogs / getSchemas through DatabaseMetaData lets the backend driver answer them directly, so DBeaver gets a real catalog/schema list and the all-databases fallback no longer fires. This is the direction @pan3793 suggested on #7419, which this PR supersedes.

What changes?

  • Replace the hand-written metadata SQL path with backend DatabaseMetaData calls.
  • Add dialect support for database-term handling and metadata column-label normalization.
  • Pass the resolved session dialect consistently through metadata operations.
  • Apply the URL-level database before running per-session initialize SQL.
  • Add ResultSetFetchIterator for correct metadata streaming, partial fetch, and idempotent close behavior.

Compatibility notes

Metadata result sets now follow the JDBC spec more closely. Clients using spec column names should benefit; clients depending on the old non-spec labels or catalog/schema layout may need updates.

When the JDBC URL carries a database, JdbcSessionImpl now switches to it before executing the per-session initialize SQL, so init SQL runs in the URL-specified database rather than the driver's default.

How was this patch tested?

  • Updated existing per-dialect OperationSuite coverage for MySQL, PostgreSQL, Oracle, Phoenix, ClickHouse, Doris, StarRocks, and Impala.
  • Added backend metadata contract suites that verify the Hive JDBC client over Thrift sees JDBC-spec column names, key fields, and metadata filters.
  • Added ResultSetFetchIteratorSuite for prefetch, partial fetch, and close behavior.
  • Verified end-to-end via beeline against all eight dialects (MySQL, PostgreSQL, Oracle, Phoenix, ClickHouse, Doris, StarRocks, Impala): metadata operations return JDBC-spec column layouts and SELECT against existing tables succeeds.
  • Full DDL/DML cycles (DROP IF EXISTS -> CREATE -> INSERT -> SELECT -> DROP) verified on StarRocks, Doris, MySQL, PostgreSQL, and ClickHouse; Oracle / Phoenix / Impala excluded since their DDL grammar diverges.

Was this patch authored or co-authored using generative AI tooling?

Yes. Claude Opus 4.7 helped draft parts of the test cases and local test-environment setup. The implementation, test results, and final changes were reviewed and validated by the author.


Closes #7305. Supersedes #7419.

…eMetaData

The JDBC engine's metadata operations previously rendered hand-written
INFORMATION_SCHEMA SQL per dialect. As a result:

- DBeaver / Tableau / Hive CLI saw column names and TABLE_CAT / TABLE_SCHEM
  layouts that did not match the JDBC spec, which broke catalog/schema
  discovery and forced the client into expensive fallback queries.
- Each dialect duplicated SQL builder code that was hard to keep in sync
  with the JDBC contract.

This commit routes getCatalogs / getSchemas / getTables / getColumns /
getTypeInfo / getFunctions / getPrimaryKeys / getCrossReference through
`DatabaseMetaData.getXxx(...)` on the backend connection and exposes the
result-set unchanged. Catalog-vs-schema convention is delegated to each
dialect via the existing JdbcDialect SPI:

- `DatabaseTermSupport` mixin captures the common pattern shared by
  ClickHouse and the MySQL family (MySQL / Doris / StarRocks) where the
  driver exposes the database as either catalog or schema, never both.
- `SchemaHelper.normalizeMetadataColumnLabel` lets dialects rewrite
  backend-driver labels that diverge from the JDBC spec (Apache Impala's
  HS2 reports `TABLE_MD` instead of `TABLE_SCHEM`). The hook lives on
  SchemaHelper but is only invoked on the metadata-only path so user
  SELECT aliases are unaffected.
- `ExecuteMetaDataOperation`, `Set/GetCurrentCatalog`,
  `Set/GetCurrentDatabase` resolve dialect from the session's sessionConf
  and inject it into their closures so the metadata call and downstream
  rowset / label normalisation use the same dialect.
- `JdbcSessionImpl` applies the URL database before running the per-session
  initialize SQL so DDL in the init SQL executes against the user-requested
  database rather than the driver's default.
- Streaming metadata results go through a new `ResultSetFetchIterator`
  that prefetches one row to drive hasNext without consuming and closes
  the result set + statement idempotently.

Existing OperationSuite tests across all eight dialects (MySQL,
PostgreSQL, Oracle, Phoenix, ClickHouse, Doris, StarRocks, Impala) are
updated for the new column layouts the JDBC spec mandates.
WithExternalJdbcEngine / KYUUBI_TEST_*_URL escape hatch lets the same
suites run against a long-running docker stack instead of testcontainers
(unset env vars => testcontainers fallback, CI unaffected).
One <Dialect>MetadataContractSuite per backend (MySQL, PostgreSQL, Oracle,
Phoenix, ClickHouse, Doris, StarRocks, Impala) that exercises the metadata
ops over Thrift via the Hive JDBC client and asserts the result-set schema
and key fields downstream consumers (DBeaver / Tableau / Hive CLI) actually
see. Assertions include strict filter-honoring on getTables / getColumns to
catch "filter param silently dropped" regressions.

Shared helpers live in MetadataTestHelpers (assertJdbcSpecColumnNames,
assertColumnContract, assertTableRow). Per-driver gaps documented inline
where assertions are skipped (Phoenix PK / getTypeInfo / getFunctions,
StarRocks getFunctions).
…readability

Pull the (JdbcDialect, Connection) => ... closures out of the
addOperation(...) calls in JdbcOperationManager into named vals, and
introduce ExecuteMetaDataOperation.MetaDataCall type alias for the
metadata variant. Rename the lambda parameter d -> dialect.
@wangzhigang1999 wangzhigang1999 changed the title [KYUUBI #7305] Route JDBC engine metadata calls through DatabaseMetaData [KYUUBI #7305] Route JDBC engine metadata calls through DatabaseMetaData May 11, 2026
…tion

Address @pan3793's review on apache#7442: handle USE_CATALOG symmetrically with
USE_DATABASE at session open, so the URL-level catalog is forwarded via
JdbcDialect.setCatalog. Both branches share the "default" Hive-client stub
filter.

Fix the PostgreSQL getSchemas assertion that still expected the old
hand-written SQL path's null TABLE_CATALOG. With DatabaseMetaData
passthrough the PG driver returns the real database name ("postgres").
@wangzhigang1999 wangzhigang1999 marked this pull request as ready for review May 11, 2026 13:38
…dbc-it

OperationWithServerSuite's "phoenix - URL database is applied to backend
connection" failed in CI with "No suitable driver" because the test's
DriverManager.getConnection bypasses the Kyuubi server and hits the
in-process embedded JDBC engine directly. The in-process engine resolves
the driver from the test JVM classpath, but phoenix-queryserver-client
was only copied to target/ via maven-dependency-plugin for the
ENGINE_JDBC_EXTRA_CLASSPATH of the spawned subprocess engine, not added
as a maven dependency. mysql-connector-j and clickhouse-jdbc are already
declared as test deps - this aligns Phoenix with them.

@pan3793 pan3793 left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for making this big change, it overall looks good. Have you tried with DBeaver, is problem solved?

@wangzhigang1999

Copy link
Copy Markdown
Contributor Author

thanks for making this big change, it overall looks good. Have you tried with DBeaver, is problem solved?

Verified end-to-end on Impala 4.5.0 — DBeaver via Kyuubi shows the full column tree:

image

… dialects

Replace the DatabaseTermSupport trait mixin with hierarchical inheritance:
move dual-write setSchema/setCatalog and read-with-fallback semantics
directly into MySQLDialect (inherited by Doris/StarRocks) and
ClickHouseDialect. Drop the SQLFeatureNotSupportedException fallback that
was never exercised in practice.
…source link

Add a permalink (Impala 4.5.0 tag) to MetadataOp.java#L114 in the
comment so reviewers can verify the upstream typo directly.
…ncrete classes

Convert the metadata and lifecycle operations from higher-order-function
constructor parameters to concrete subclasses, matching the per-operation
file style of the spark/hive engines. ExecuteMetaDataOperation becomes an
abstract base whose subclasses (GetTypeInfo, GetCatalogs, GetSchemas,
GetTables, GetTableTypes, GetColumns, GetFunctions, GetPrimaryKeys,
GetCrossReference) each fix the metadata RPC; the four lifecycle classes
(SetCurrentCatalog/Database, GetCurrentCatalog/Database) likewise own the
catalog/database parameter directly and call the dialect from runInternal.
JdbcOperationManager's factory methods collapse to one-liners.

@pan3793 pan3793 left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, a few nits.

- ImpalaDialect: simplify Statement/ResultSet lifecycle with
  JdbcUtils.withCloseable
- Rename ExecuteMetaDataOperation -> MetaDataOperation
- Rename runMetaDataCall -> fetchMetaData
@pan3793 pan3793 added this to the v1.12.0 milestone May 13, 2026
@pan3793 pan3793 closed this in 98f81f2 May 13, 2026
@pan3793

pan3793 commented May 13, 2026

Copy link
Copy Markdown
Member

thanks, merged to master

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] DBeaver returned all tables in all databases when connecting.

2 participants