feat(epp): Add plugin lifecycle and stability levels proposal#2684
feat(epp): Add plugin lifecycle and stability levels proposal#2684hexfusion wants to merge 1 commit intokubernetes-sigs:mainfrom
Conversation
✅ Deploy Preview for gateway-api-inference-extension ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: hexfusion The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
Hi @hexfusion. Thanks for your PR. I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with Regular contributors should join the org to skip this step. Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
Signed-off-by: Sam Batschelet <sbatsche@redhat.com>
e2e8491 to
d1e132c
Compare
| func Register(pluginType string, factory FactoryFunc) { | ||
| Registry[pluginType] = RegistryEntry{ | ||
| Factory: factory, | ||
| Stability: Unknown, |
There was a problem hiding this comment.
will it reject out of tree plugin until the migration?
There was a problem hiding this comment.
tbh, I would update Register function (instead of adding a different one) and out of tree plugins should be able to adopt it very easily, using Unknown on their end.
I would prefer to make it a must rather than keep Unknown stability which as far as I can see here - its behavior is not well defined.
does it require feature gate or not? should we allow it? why would out of tree plugins move from Unknown, what is their motivation?
| pluginType, entry.Stability)) | ||
| } | ||
| if entry.Stability == Alpha && entry.FeatureGate == "" { | ||
| panic(fmt.Sprintf( |
There was a problem hiding this comment.
don't panic. skip register and return an error.
There was a problem hiding this comment.
panic is scary will remove but we should consider Must in this context
| Unknown StabilityLevel = "Unknown" | ||
| Alpha StabilityLevel = "Alpha" | ||
| Beta StabilityLevel = "Beta" | ||
| Stable StabilityLevel = "Stable" |
There was a problem hiding this comment.
what about Deprecated?
Deprecated must come with Deprecation message.
| // error with migration guidance instead of the generic "not | ||
| // registered" error from instantiatePlugins. Tombstones are | ||
| // permanent and small. | ||
| var removedPlugins = map[string]string{ |
There was a problem hiding this comment.
not all plugins can be mapped 1:1 from old plugins to new.
for example - we might have an old plugin that we decide to split into two sub-plugins.
| 2. Should graduation criteria be GIE-specific, or adopt Gateway | ||
| API's requirements? | ||
| 3. Where does the stability policy live, `docs/plugin-lifecycle.md`, | ||
| `CONTRIBUTING.md`, or a dedicated proposal? |
There was a problem hiding this comment.
graduation criteria is a difficult question. if you ask 10 different people who know GIE well about 10 different plugins - you might hear different opinions about how graduated a plugin is.
I would document it the plugins doc in the website as well as in docs.
CONTRIBUTING.md is a general guide for contributions, not specific to plugins or anything related to their lifecycle.
| @@ -0,0 +1,322 @@ | |||
| # Plugin Lifecycle and Stability Levels | |||
| - bases/inference.networking.x-k8s.io_inferenceobjectives.yaml | ||
| - bases/inference.networking.x-k8s.io_inferencepoolimports.yaml | ||
| - bases/inference.networking.k8s.io_inferencepools.yaml | ||
| - rbac-aggregation.yaml |
There was a problem hiding this comment.
nit: is this part of the proposal or a different PR?
| |-------|---------|-----------------|----------------| | ||
| | **Alpha** | Gated off (requires feature gate) | No compatibility guarantee. Config schema may change between releases. | Can be removed any release. | | ||
| | **Beta** | Gated on | Config schema is stable. Behavioral changes require release notes. | 2 releases + 6 months after deprecation notice. | | ||
| | **Stable** | Always available | Full backward compatibility within config API version. | Not removed within a config API major version. | |
There was a problem hiding this comment.
nit: is there a pre-alpha "dev" stage?
- dev: Experimental features, no compatibility guarantees. Disabled by default. Intended for explorations. Must not impact any feature or code path when disabled.
- alpha: Disabled by default. APIs may change. Intended for early feedback. A feature may be initially enabled in "discovery phase" mode, where the code runs exclusively in a non-blocking/no-op mode to collect telemetry.
Conside adding documentation and testing requirements for beta and stable.
| **Feature gate integration.** Alpha plugins require an explicit | ||
| feature gate in `EndpointPickerConfig.FeatureGates`. GIE already | ||
| has a `FeatureGates []string` field on the config; this proposal | ||
| extends its use to cover per-plugin gating. |
There was a problem hiding this comment.
side note: there was also a discussion of having feature gates as map[string]bool (instead of []string) with a default on/off value based on maturity (e.g., intially default off, changed to default on in beta)
| has a `FeatureGates []string` field on the config; this proposal | ||
| extends its use to cover per-plugin gating. | ||
|
|
||
| **Config validation.** At config load time: |
There was a problem hiding this comment.
Q: Should we also consider an explicit configuration value for minimal stability, to ensure operators make an explicit decision and allow validation of config baed on that?
| Stability StabilityLevel | ||
|
|
||
| // FeatureGate is the feature gate name required for | ||
| // Alpha plugins. Must be non-empty when Stability is |
There was a problem hiding this comment.
nit: the same feature gate would also be used on Beta for disabling the plugin
|
|
||
| // MustRegister adds a plugin factory with explicit lifecycle | ||
| // metadata and panics on invalid plugin. | ||
| func MustRegister(pluginType string, entry RegistryEntry) { |
There was a problem hiding this comment.
This is an exported function which can be called directly by plugin writers to set the maturity level. From a reviewers' perspective, may want to separate stability designation from registration. Do you think it makes sense to have a separate file for maturity, mapping known plugin types to their assigned maturity levels? That way only a single file needs to be monitored and that can be easily automated in CICD. Need to think how this would be done in derived projects that register additional plugins, but perhaps that is not the core infra's responsibility...
WDYT?
|
/ok-to-test |
|
@hexfusion: The following test failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
/kind feature
/kind documentation
What this PR does / why we need it:
Proposes a plugin lifecycle model for EPP plugins. Today a plugin either exists in the registry or it doesn't -- there is no way to communicate maturity to operators. This proposal adds three stability tiers (Alpha, Beta, Stable) with defined support contracts, feature gate integration for alpha plugins, and validation tombstones for removed plugins.
Which issue(s) this PR fixes:
Fixes #2653
Does this PR introduce a user-facing change?: