Skip to content

[AUTHENTICATION] Add UnboundID-backed LDAP connection pool with failover and cache#7506

Open
moelhoussein wants to merge 1 commit into
apache:masterfrom
moelhoussein:ldap-unboundid-pool
Open

[AUTHENTICATION] Add UnboundID-backed LDAP connection pool with failover and cache#7506
moelhoussein wants to merge 1 commit into
apache:masterfrom
moelhoussein:ldap-unboundid-pool

Conversation

@moelhoussein

Copy link
Copy Markdown

This PR adds an UnboundID LDAP SDK-backed authentication path for the LDAP auth provider, as an opt-in alternative to the existing JNDI implementation (kyuubi.authentication.ldap.pool.enabled, default false). When disabled, the legacy JNDI path is unchanged.

When enabled, it provides:

  • Connection pooling — warm, pre-authenticated bind-user connections with background health checks, removing the per-request TCP + TLS + BIND round-trip.
  • Ordered failover across multiple kyuubi.authentication.ldap.url entries via FailoverServerSet, including mid-request retry when a pooled connection breaks with a connection-level error (bounded by pool.search.maxRetries).
  • Optional successful-auth cache — HMAC-keyed, TTL- and size-bounded, with request coalescing and no negative caching (failed logins always reach LDAP).
  • Observability — Prometheus counters for auth success/failure (classified as invalid_credentials / invalid_input / infrastructure), cache hit/miss, and pool statistics.
  • Filter-injection hardening — user-supplied values in search filters are RFC 4515-escaped before rendering.

Proposed and discussed in #7497

Why are the changes needed?

The JNDI-based LDAP path has several operational gaps:

  • Failover applies only at initial bind/context creation, not to in-flight searches. A server that drops mid-request surfaces as an authentication failure rather than transparently failing over to a healthy peer.
  • No connection reuse — every authentication pays a full TCP + TLS + BIND handshake.
  • Search exceptions are silently swallowed, so an LDAP outage can be masked as a "user not found" result, making incidents hard to diagnose.
  • No authentication observability — there are no success/failure or cache metrics to distinguish credential-rejection spikes from infrastructure outages.

This change closes those gaps behind an opt-in flag, leaving existing deployments on the JNDI path untouched unless they explicitly enable the pool.

How was this patch tested?

  • New and existing unit/integration tests run against an in-memory UnboundID directory: pool init with first URL refused / second healthy, mid-request failover retry, no double-release on replacement-checkout failure, DirSearch close/no-leak, pool reuse under a burst, auth-cache coalescing and no-negative-caching, failure classification (incl. JNDI-path javax.naming.AuthenticationExceptioninvalid_credentials), RFC 4515 filter-injection escaping, LDAP URL parsing, and JNDI-vs-UnboundID parity for executeCustomQuery.
  • Full kyuubi-common test suite passes (341 tests).
  • docs/configuration/settings.md regenerated and verified by the AllKyuubiConfiguration golden-file test; dev/dependencyList regenerated via build/dependency.sh.
  • spotless:check and scalastyle:check pass for kyuubi-common and kyuubi-server.

Was this patch authored or co-authored using generative AI tooling?

Yes. Generative AI tooling assisted in authoring and reviewing this patch:

  • Anthropic Claude (Opus/Sonnet) — code authoring and refactoring assistance
  • OpenAI GPT-5.5 — code review
  • GitHub Copilot Enterprise — automated pull-request review

All AI-assisted output was human-reviewed, tested, and is being contributed under the DCO; I take full responsibility for it as the contributor. Per the [ASF Generative Tooling Guidance](https://www.apache.org/legal/generative-tooling.html), I have reviewed each tool's terms and have reasonable certainty that (1) none place restrictions on the output inconsistent with the Open Source Definition, and (2) the contributed output is either non-copyrightable boilerplate or original work that does not incorporate third-party materials. GitHub Copilot Enterprise ran with its "Suggestions matching public code" policy set to Blocked (duplicate-detection filter enabled) and is covered by GitHub's IP indemnification.

Tool terms: [Anthropic Commercial Terms](https://www.anthropic.com/legal/commercial-terms), [OpenAI Terms of Use](https://openai.com/policies/terms-of-use), [GitHub Copilot Trust Center](https://resources.github.com/copilot-trust-center/).

…ver and auth cache

Adds an opt-in UnboundID LDAP SDK authentication path
(kyuubi.authentication.ldap.pool.enabled, default false) alongside the existing
JNDI implementation. When enabled it provides a health-checked connection pool,
ordered multi-URL failover with bounded mid-request retry, an optional
successful-auth cache (HMAC-keyed, TTL/size-bounded, no negative caching,
request coalescing), Prometheus auth/cache/pool metrics with failure
classification, and RFC 4515 escaping of user-supplied filter values. The
legacy JNDI path is unchanged when the pool is disabled.

Proposed and discussed in apache#7497

Assisted-by: Claude Opus 4.8 (claude-opus-4-8)
Assisted-by: OpenAI GPT-5.5
Assisted-by: GitHub Copilot Enterprise
@github-actions github-actions Bot added kind:documentation Documentation is a feature! kind:infra license, community building, project builds, asf infra related, etc. module:server module:common kind:build module:metrics labels Jun 9, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

kind:build kind:documentation Documentation is a feature! kind:infra license, community building, project builds, asf infra related, etc. module:common module:metrics module:server

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant