Skip to content

[KYUUBI #7387] Fix the redaction of sensitive values#7451

Open
LamiumAmplexicaule wants to merge 3 commits into
apache:masterfrom
LamiumAmplexicaule:fix-redaction
Open

[KYUUBI #7387] Fix the redaction of sensitive values#7451
LamiumAmplexicaule wants to merge 3 commits into
apache:masterfrom
LamiumAmplexicaule:fix-redaction

Conversation

@LamiumAmplexicaule

@LamiumAmplexicaule LamiumAmplexicaule commented May 15, 2026

Copy link
Copy Markdown
Contributor

Why are the changes needed?

Fix the redaction of sensitive values when kyuubi.server.redaction.regex is configured.
Close #7387.

How was this patch tested?

Unit tests.

$ ./build/mvn clean install -Dtest=none -DwildcardSuites=org.apache.kyuubi.BatchTestHelper,org.apache.kyuubi.engine.EngineRefTests,org.apache.kyuubi.engine.EngineRefWithZookeeperSuite,org.apache.kyuubi.engine.JpsApplicationOperationSuite,org.apache.kyuubi.engine.dataagent.DataAgentProcessBuilderSuite,org.apache.kyuubi.engine.flink.FlinkProcessBuilderSuite,org.apache.kyuubi.engine.hive.HiveProcessBuilderSuite,org.apache.kyuubi.engine.hive.HiveYarnModeProcessBuilderSuite,org.apache.kyuubi.engine.jdbc.JdbcProcessBuilderSuite,org.apache.kyuubi.engine.jdbc.JdbcYarnModeProcessBuilderSuite,org.apache.kyuubi.engine.spark.SparkBatchProcessBuilderSuite,org.apache.kyuubi.engine.spark.SparkProcessBuilderSuite,org.apache.kyuubi.engine.trino.TrinoProcessBuilderSuite,org.apache.kyuubi.server.api.v1.AdminResourceSuite,org.apache.kyuubi.server.api.v1.BatchesResourceSuite,org.apache.kyuubi.server.rest.client.AdminCtlSuite,org.apache.kyuubi.server.rest.client.AdminRestApiSuite,org.apache.kyuubi.server.rest.client.PySparkBatchRestApiSuite,org.apache.kyuubi.UtilsSuite

Was this patch authored or co-authored using generative AI tooling?

Partially assisted by Claude Code (Claude Opus 4.6) for implementation plan

Comment on lines -328 to -338
var nextKV = false
commands.map {
case PATTERN_FOR_KEY_VALUE_ARG(key, value) if nextKV =>
case PATTERN_FOR_KEY_VALUE_ARG(key, value) =>
val (_, newValue) = redact(redactionPattern, Seq((key, value))).head
nextKV = false
genKeyValuePair(key, newValue)

case cmd if cmd == CONF =>
nextKV = true
cmd

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the test that checks whether the logs in FlinkProcessBuilder are redacted, I noticed that values in the -D format don’t get redacted, so I removed this filter.

import org.apache.kyuubi.util.command.CommandLineUtils._

class DataAgentProcessBuilder(
override val serverConf: KyuubiConf,

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think adding an extra serverConf parameter is a good idea. This would result in two configurations that appear identical except for their names, which could cause significant confusion during usage—users might wonder, "Which configuration should I actually use?"

@LamiumAmplexicaule LamiumAmplexicaule May 15, 2026

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your comments.

My understanding is that the reason redaction no longer happens after #7054 is that, in

val userConf = this.getConf.getUserDefaults(user)
, the sessionConf passed in has the serverOnly configs stripped out.

If we want to obtain kyuubi.server.redaction.regex from ProcessBuilder without passing serverConf, we need to explicitly set it somewhere.
However, it feels wrong to propagate server-side config into the session/engine configs, so I chose the approach of passing serverConf instead.

When passing conf as serverConf, if we strip out everything except serverOnly, we might avoid accidentally referencing serverConf.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am planning to update this PR to pass Option[Regex] instead of serverConf.

@wangzhigang1999

Copy link
Copy Markdown
Contributor

Thanks for tracking this down!

I'd like to talk about how the fix is done before it lands. Not because the direction is wrong — I'm just not sure that passing the pattern into every ProcBuilder is the best long-term approach. I might be missing something, so feel free to disagree.

What I agree with: redaction is done on the server side, so the pattern has to be read from the server config, not from the config that goes to the engine. That part is correct and necessary.

My concern is the mechanism. conf.get(...) returned None because serverOnly is implemented by removing the entry from the per-session config in getUserDefaults. ProcBuilder runs on the server but holds that stripped config, so redaction quietly turned off. Passing Option[Regex] down fixes redaction, but it only fixes this one symptom:

  • It doesn't scale. The next serverOnly config that some server-side code needs while launching an engine will hit the same problem, and we'd add another parameter for it.
  • An earlier version of this PR passed serverConf: KyuubiConf instead. I think that's worse: now there are two configs of the same type, told apart only by their variable name — which is exactly the kind of mix-up that caused this bug.

So the real problem is that serverOnly mixes two unrelated things into one "remove": "don't send this to the engine" and "users can't change this". Removing the entry does both, but it also means any server-side code that legitimately needs the value can no longer read it.

There's a related problem with the same cause, which is why I'd rather fix the model than patch it: the launch path does conf.getAll with no per-engine filtering (SparkProcessBuilder), prefixes everything with spark., and ships it. So a Flink tuning key like kyuubi.engine.flink.memory, set in a shared kyuubi-defaults.conf, ends up on the Spark engine's command line as spark.kyuubi.engine.flink.memory=... — harmless noise for tuning keys, but any sensitive value in there rides along too. Both problems are "send everything, then remove a few".

The direction I'd suggest is to split serverOnly into two separate, independent flags, and stop using "remove" to control where a config goes:

  • audience — which engines a config is allowed to reach. Instead of removing it from the config, we filter it out when building the engine's startup command. The value is figured out from the key prefix by default (kyuubi.engine.spark.* → Spark, kyuubi.server.* → server-only), and can also be set by hand on the entry when the prefix isn't enough. So most configs need no marking at all, but you can still set it manually when you need to.
  • immutable — users can't override it. This is checked when Kyuubi merges the user's config. It's set by hand, on purpose, because "can a user change this?" is a security decision worth deciding per config.

Then serverOnly is just audience(server-only) + immutable, and we can drop it. Two benefits: (1) the value stays in the server config and is fixed, so redaction goes back to a plain conf.get and the Option[Regex] passing here goes away completely; (2) filtering by audience also stops sending one engine's configs to other engines. One model, both problems, no per-config wiring.

flowchart TB
    MASTER["server config (loaded on server, the source of truth)"]
    UO["user config: connection / SET / user-defaults"]
    UO --> IMM{"immutable key?"}
    IMM -->|yes| PIN["user value ignored, server value kept"]
    IMM -->|no| APPLY["user value applied"]
    MASTER --> CONF
    PIN --> CONF
    APPLY --> CONF
    CONF["full config on the server — nothing removed"]
    CONF -->|"conf.get(...)"| READ["server-side reads — redaction sees the real value"]
    CONF --> PROJ{"build config for target"}
    PROJ -->|"engine = Spark"| SPARK["send to Spark: audience is Spark or all-engines"]
    PROJ -->|"engine = Flink"| FLINK["send to Flink: audience is Flink or all-engines"]
    PROJ -->|"logs / REST"| LOG["sent out, masked if sensitive"]
Loading

In short: immutable is checked when a user config comes in, audience is applied when a config goes out, and because nothing is removed from the server config, server-side reads (including redaction) always see the real value — which is exactly what broke here.

Curious to hear your thoughts!

CC @pan3793

@LamiumAmplexicaule

LamiumAmplexicaule commented Jun 2, 2026

Copy link
Copy Markdown
Contributor Author

Thank you for the suggestion.
I think it's a very good idea.
It also reduces the number of items we need to configure in kyuubi.session.conf.restrict.list, which should make it easier to manage.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] kyuubi.server.redaction.regex no longer redacts sensitive values in the "Launching engine:" log line

2 participants