Skip to content

Document KHI job mode#709

Open
Haihan-Jiang wants to merge 3 commits into
GoogleCloudPlatform:mainfrom
Haihan-Jiang:codex/khi-document-job-mode
Open

Document KHI job mode#709
Haihan-Jiang wants to merge 3 commits into
GoogleCloudPlatform:mainfrom
Haihan-Jiang:codex/khi-document-job-mode

Conversation

@Haihan-Jiang

Copy link
Copy Markdown
Contributor

Summary

  • add a setup guide for running KHI in job mode
  • document the required job mode flags, supported inspection type IDs, and a basic GKE example
  • link the new guide from the README user guide section

Fixes #196

Tests

  • go test ./pkg/parameters
  • git diff --check

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces documentation for running KHI in job mode. It adds a new guide, docs/en/setup-guide/job-mode.md, which details the required flags, command format, inspection values, and feature selection, and updates README.md to link to this new guide. There are no review comments, and I have no feedback to provide.

@Haihan-Jiang

Copy link
Copy Markdown
Contributor Author

This is ready for review from my side. The remaining github-deploy-ondemand check appears to require project-side approval/run permissions; could someone take a look or trigger it when convenient?

--job-mode \
--job-inspection-type gcp-gke \
--job-inspection-features ALL \
--job-inspection-values '{

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the most difficult part of this feature is building the --job-inspection-values parameter.
The current easiest way is using dev tool on browser to see what is sent from the frontend. It's a little hacky way but worth to be documented.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in 18e401e. The guide now keeps the browser dev tools /run or /dryrun payload workflow, clarifies that set-form fields must be JSON arrays, and updates the GKE example to use the same field IDs/value shapes that job mode expects. I also added a table of extra fields required when optional GKE features are enabled.

Comment thread docs/en/setup-guide/job-mode.md Outdated
"cloud.google.com/common/input-location": "us-central1",
"cloud.google.com/common/input-end-time": "2026-01-15T10:00:00Z",
"cloud.google.com/common/input-duration": "2h",
"<inspection-specific-field-id>": "<value>"

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be better if we can show the example workable job command with the full parameter.
The current example lacks of many parameters and user need to find what are missing. It'll be nicer if we can list keys required for actually running the job mode. Values for the parameters can be placeholder then.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also addressed in 18e401e. I replaced the ALL example with an explicit default GKE feature list and a payload that uses the full required field set for that selected feature set. The guide now also lists the extra field keys to include when optional GKE features are enabled or when ALL is used.

@kyasbal kyasbal added the documentation Improvements or additions to documentation label Jun 1, 2026
@Haihan-Jiang Haihan-Jiang force-pushed the codex/khi-document-job-mode branch from 41a81bf to 0c9b9a5 Compare June 3, 2026 00:34
@Haihan-Jiang

Copy link
Copy Markdown
Contributor Author

Thanks, I updated the job mode doc in two places:

  • added a practical way to build --job-inspection-values from the browser dev tools request payload for /run or /dryrun
  • expanded the GKE example to include concrete common field IDs such as project, location, cluster name, namespaces, and kinds

I kept the wording scoped because the exact required fields still depend on the selected inspection type and enabled features.

@kyasbal

kyasbal commented Jun 11, 2026

Copy link
Copy Markdown
Member

The current example parameter list is still missing several fields. How about asking your codex to run the job to test?

@Haihan-Jiang

Copy link
Copy Markdown
Contributor Author

Thanks, I asked Codex to run the job-mode command and it found two concrete issues in the example.

Fixed in 18e401e:

  • set-form values such as cloud.google.com/k8s/input-namespaces and cloud.google.com/k8s/input-kinds must be JSON arrays, not scalar strings. The previous example fails with request parameter cloud.google.com/k8s/input-kinds#default was not given in array.
  • ALL enables optional GKE features that are disabled by default in the UI, so the runnable default GKE example now uses an explicit feature ID list. I also added the extra fields needed when optional features such as node, container, control plane, CSM, and serial port logs are selected.

Validation performed locally:

  • npx markdownlint-cli2 docs/en/setup-guide/job-mode.md -> 0 errors.
  • Re-ran job mode with the corrected payload and explicit default GKE feature list. It now gets past the form parameter parsing and stops at local ADC setup: failed to create monitoring metric client: credentials: could not find default credentials. I do not have ADC configured in this local environment, so I could not complete a Cloud Logging-backed export here, but the prior parameter-shape failure is fixed.

@Haihan-Jiang Haihan-Jiang force-pushed the codex/khi-document-job-mode branch from 18e401e to 5a2e347 Compare June 14, 2026 10:08
@kyasbal

kyasbal commented Jun 15, 2026

Copy link
Copy Markdown
Member

I'm sorry to return this back again and again. However, this example query must generate the empty result even if the target project and cluster actually exist because resource names input are missing.

We can find the right parameters from chrome inspector when we fills parameters on the form.
image

Thus parameters beginning with cloud.google.com/common/input-query-resource-names would be necessary to be filled with projects/<the-proejct-id>. This is not great from user experience perspective but for considering the project forwarding its logs to the other projects, KHI receives target project for each queries. These are usually optional field on the UI and automatically filed from the other project ID field. But it's still required for the job mode. Indeed this behavior must be improved in near future...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Document KHI Job Mode

2 participants