Skip to content

feat(epp): Add plugin lifecycle and stability levels proposal#2684

Open
hexfusion wants to merge 1 commit intokubernetes-sigs:mainfrom
hexfusion:plugin-lifecycle-proposal
Open

feat(epp): Add plugin lifecycle and stability levels proposal#2684
hexfusion wants to merge 1 commit intokubernetes-sigs:mainfrom
hexfusion:plugin-lifecycle-proposal

Conversation

@hexfusion
Copy link
Copy Markdown
Contributor

/kind feature
/kind documentation

What this PR does / why we need it:

Proposes a plugin lifecycle model for EPP plugins. Today a plugin either exists in the registry or it doesn't -- there is no way to communicate maturity to operators. This proposal adds three stability tiers (Alpha, Beta, Stable) with defined support contracts, feature gate integration for alpha plugins, and validation tombstones for removed plugins.

Which issue(s) this PR fixes:

Fixes #2653

Does this PR introduce a user-facing change?:

  NONE

@k8s-ci-robot k8s-ci-robot added kind/feature Categorizes issue or PR as related to a new feature. kind/documentation Categorizes issue or PR as related to documentation. labels Mar 24, 2026
@netlify
Copy link
Copy Markdown

netlify bot commented Mar 24, 2026

Deploy Preview for gateway-api-inference-extension ready!

Name Link
🔨 Latest commit d1e132c
🔍 Latest deploy log https://app.netlify.com/projects/gateway-api-inference-extension/deploys/69c2dea96452690008e30a35
😎 Deploy Preview https://deploy-preview-2684--gateway-api-inference-extension.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@k8s-ci-robot
Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: hexfusion
Once this PR has been reviewed and has the lgtm label, please assign danehans for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Mar 24, 2026
@k8s-ci-robot
Copy link
Copy Markdown
Contributor

Hi @hexfusion. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work.

Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Mar 24, 2026
Signed-off-by: Sam Batschelet <sbatsche@redhat.com>
func Register(pluginType string, factory FactoryFunc) {
Registry[pluginType] = RegistryEntry{
Factory: factory,
Stability: Unknown,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will it reject out of tree plugin until the migration?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tbh, I would update Register function (instead of adding a different one) and out of tree plugins should be able to adopt it very easily, using Unknown on their end.
I would prefer to make it a must rather than keep Unknown stability which as far as I can see here - its behavior is not well defined.
does it require feature gate or not? should we allow it? why would out of tree plugins move from Unknown, what is their motivation?

pluginType, entry.Stability))
}
if entry.Stability == Alpha && entry.FeatureGate == "" {
panic(fmt.Sprintf(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't panic. skip register and return an error.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

panic is scary will remove but we should consider Must in this context

Comment on lines +157 to +160
Unknown StabilityLevel = "Unknown"
Alpha StabilityLevel = "Alpha"
Beta StabilityLevel = "Beta"
Stable StabilityLevel = "Stable"
Copy link
Copy Markdown
Contributor

@nirrozenbaum nirrozenbaum Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about Deprecated?
Deprecated must come with Deprecation message.

// error with migration guidance instead of the generic "not
// registered" error from instantiatePlugins. Tombstones are
// permanent and small.
var removedPlugins = map[string]string{
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not all plugins can be mapped 1:1 from old plugins to new.
for example - we might have an old plugin that we decide to split into two sub-plugins.

Comment on lines +318 to +321
2. Should graduation criteria be GIE-specific, or adopt Gateway
API's requirements?
3. Where does the stability policy live, `docs/plugin-lifecycle.md`,
`CONTRIBUTING.md`, or a dedicated proposal?
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

graduation criteria is a difficult question. if you ask 10 different people who know GIE well about 10 different plugins - you might hear different opinions about how graduated a plugin is.
I would document it the plugins doc in the website as well as in docs.
CONTRIBUTING.md is a general guide for contributions, not specific to plugins or anything related to their lifecycle.

@@ -0,0 +1,322 @@
# Plugin Lifecycle and Stability Levels
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice proposal :)

- bases/inference.networking.x-k8s.io_inferenceobjectives.yaml
- bases/inference.networking.x-k8s.io_inferencepoolimports.yaml
- bases/inference.networking.k8s.io_inferencepools.yaml
- rbac-aggregation.yaml
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: is this part of the proposal or a different PR?

|-------|---------|-----------------|----------------|
| **Alpha** | Gated off (requires feature gate) | No compatibility guarantee. Config schema may change between releases. | Can be removed any release. |
| **Beta** | Gated on | Config schema is stable. Behavioral changes require release notes. | 2 releases + 6 months after deprecation notice. |
| **Stable** | Always available | Full backward compatibility within config API version. | Not removed within a config API major version. |
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: is there a pre-alpha "dev" stage?

  • dev: Experimental features, no compatibility guarantees. Disabled by default. Intended for explorations. Must not impact any feature or code path when disabled.
  • alpha: Disabled by default. APIs may change. Intended for early feedback. A feature may be initially enabled in "discovery phase" mode, where the code runs exclusively in a non-blocking/no-op mode to collect telemetry.

Conside adding documentation and testing requirements for beta and stable.

**Feature gate integration.** Alpha plugins require an explicit
feature gate in `EndpointPickerConfig.FeatureGates`. GIE already
has a `FeatureGates []string` field on the config; this proposal
extends its use to cover per-plugin gating.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

side note: there was also a discussion of having feature gates as map[string]bool (instead of []string) with a default on/off value based on maturity (e.g., intially default off, changed to default on in beta)

has a `FeatureGates []string` field on the config; this proposal
extends its use to cover per-plugin gating.

**Config validation.** At config load time:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Q: Should we also consider an explicit configuration value for minimal stability, to ensure operators make an explicit decision and allow validation of config baed on that?

Stability StabilityLevel

// FeatureGate is the feature gate name required for
// Alpha plugins. Must be non-empty when Stability is
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: the same feature gate would also be used on Beta for disabling the plugin


// MustRegister adds a plugin factory with explicit lifecycle
// metadata and panics on invalid plugin.
func MustRegister(pluginType string, entry RegistryEntry) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an exported function which can be called directly by plugin writers to set the maturity level. From a reviewers' perspective, may want to separate stability designation from registration. Do you think it makes sense to have a separate file for maturity, mapping known plugin types to their assigned maturity levels? That way only a single file needs to be monitored and that can be easily automated in CICD. Need to think how this would be done in derived projects that register additional plugins, but perhaps that is not the core infra's responsibility...
WDYT?

@ahg-g
Copy link
Copy Markdown
Contributor

ahg-g commented Apr 2, 2026

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Apr 2, 2026
@k8s-ci-robot
Copy link
Copy Markdown
Contributor

@hexfusion: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-gateway-api-inference-extension-test-e2e-main d1e132c link true /test pull-gateway-api-inference-extension-test-e2e-main

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/documentation Categorizes issue or PR as related to documentation. kind/feature Categorizes issue or PR as related to a new feature. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Plugin Lifecycle and Stability Levels

5 participants