[KYUUBI #7387] Fix the redaction of sensitive values#7451
[KYUUBI #7387] Fix the redaction of sensitive values#7451LamiumAmplexicaule wants to merge 3 commits into
Conversation
| var nextKV = false | ||
| commands.map { | ||
| case PATTERN_FOR_KEY_VALUE_ARG(key, value) if nextKV => | ||
| case PATTERN_FOR_KEY_VALUE_ARG(key, value) => | ||
| val (_, newValue) = redact(redactionPattern, Seq((key, value))).head | ||
| nextKV = false | ||
| genKeyValuePair(key, newValue) | ||
|
|
||
| case cmd if cmd == CONF => | ||
| nextKV = true | ||
| cmd | ||
|
|
There was a problem hiding this comment.
In the test that checks whether the logs in FlinkProcessBuilder are redacted, I noticed that values in the -D format don’t get redacted, so I removed this filter.
| import org.apache.kyuubi.util.command.CommandLineUtils._ | ||
|
|
||
| class DataAgentProcessBuilder( | ||
| override val serverConf: KyuubiConf, |
There was a problem hiding this comment.
I don't think adding an extra serverConf parameter is a good idea. This would result in two configurations that appear identical except for their names, which could cause significant confusion during usage—users might wonder, "Which configuration should I actually use?"
There was a problem hiding this comment.
Thank you for your comments.
My understanding is that the reason redaction no longer happens after #7054 is that, in
If we want to obtain kyuubi.server.redaction.regex from ProcessBuilder without passing serverConf, we need to explicitly set it somewhere.
However, it feels wrong to propagate server-side config into the session/engine configs, so I chose the approach of passing serverConf instead.
When passing conf as serverConf, if we strip out everything except serverOnly, we might avoid accidentally referencing serverConf.
There was a problem hiding this comment.
I am planning to update this PR to pass Option[Regex] instead of serverConf.
84913ef to
e650ce5
Compare
|
Thanks for tracking this down! I'd like to talk about how the fix is done before it lands. Not because the direction is wrong — I'm just not sure that passing the pattern into every What I agree with: redaction is done on the server side, so the pattern has to be read from the server config, not from the config that goes to the engine. That part is correct and necessary. My concern is the mechanism.
So the real problem is that There's a related problem with the same cause, which is why I'd rather fix the model than patch it: the launch path does The direction I'd suggest is to split
Then flowchart TB
MASTER["server config (loaded on server, the source of truth)"]
UO["user config: connection / SET / user-defaults"]
UO --> IMM{"immutable key?"}
IMM -->|yes| PIN["user value ignored, server value kept"]
IMM -->|no| APPLY["user value applied"]
MASTER --> CONF
PIN --> CONF
APPLY --> CONF
CONF["full config on the server — nothing removed"]
CONF -->|"conf.get(...)"| READ["server-side reads — redaction sees the real value"]
CONF --> PROJ{"build config for target"}
PROJ -->|"engine = Spark"| SPARK["send to Spark: audience is Spark or all-engines"]
PROJ -->|"engine = Flink"| FLINK["send to Flink: audience is Flink or all-engines"]
PROJ -->|"logs / REST"| LOG["sent out, masked if sensitive"]
In short: Curious to hear your thoughts! CC @pan3793 |
|
Thank you for the suggestion. |
Why are the changes needed?
Fix the redaction of sensitive values when
kyuubi.server.redaction.regexis configured.Close #7387.
How was this patch tested?
Unit tests.
Was this patch authored or co-authored using generative AI tooling?
Partially assisted by Claude Code (Claude Opus 4.6) for implementation plan