Refactor cloud provider builder into a dynamic registration pattern#9639
Refactor cloud provider builder into a dynamic registration pattern#9639Choraden wants to merge 1 commit into
Conversation
|
This issue is currently awaiting triage. If SIG Autoscaling contributors determines this is a relevant issue, they will accept it by applying the The DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: Choraden The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
70fc12e to
b34a32a
Compare
|
I like the pattern, it's largely used for libraries which want to avoid importing implementation-specific packages like golang SQL. Considering that in the long term we'll be moving to library form of autoscaler repository - it makes sense to start doing such steps right now One thing which bothers me is the binary size inflation as we are removing support for build tags, can you show the difference in binary sizes if we pick one provider (e.g. GCE) and compare it against to build from current revision? |
Good catch. The functionality of builds behind tags can be easily restored by mimicking the previous builder_providerX files structure but in the main package. Each file will be registering particular provider. For Example: //go:build gce
import _ "k8s.io/autoscaler/cluster-autoscaler/cloudprovider/gce" |
That's reasonable, but it will require us creating a golang module per build tag in the root of Do you agree @jackfrancis @towca @BigDarkClown? |
BigDarkClown
left a comment
There was a problem hiding this comment.
I really like the pattern, it feels much better than the previous one.
When it comes to tags, I am okay with the current solution. Note that while the binary size will increase for provider-specific images, the main one will stay the same. If somebody already goes through the trouble of building a per-provider image, they are likely doing that in a fork already, and can adjust with minimal input.
| return utho.BuildUtho(opts, do, rl) | ||
| case cloudprovider.CoreWeaveProviderName: | ||
| return coreweave.BuildCoreWeave(opts, do, rl) | ||
| if builder, ok := GetCloudProviderBuilder(opts.CloudProviderName); ok { |
There was a problem hiding this comment.
This function seems redundant after the changes, could we just roll its body inside NewCloudProvider()?
| // The registration pattern allows for customizing the set of supported cloud providers | ||
| // by including or excluding these blank imports. This is particularly useful for | ||
| // external forks that want to avoid unnecessary dependencies. | ||
| _ "k8s.io/autoscaler/cluster-autoscaler/cloudprovider/alicloud" |
There was a problem hiding this comment.
It seems that we could preserve the tag-specific behavior pretty easily:
- Create a new pkg with the same structure as the
cloudprovider/builderpkg before these changes:builder_<provider>.gofile for each cloud provider, with the appropriate provider-specific build tag. This file would just contain the blank import for the appropriate provider.builder_all.gofile with the negative tags, only included if none of the provider-specific tags are used. This file would contain blank imports for all the providers, like here.- Maybe instead of having the blank imports which are pretty vague on their own, we actually put the whole
init()functions in the new pkg instead of directly incloudprovider/<provider>? IMO it'd make a lot of sense, we could name the new pkg something likecloudprovider/router?
- Have
main.goimport this new pkg instead of directly doing the blank imports. This way we still have the benefits ofNewCloudProvider/NewAutoscalernot depending on any cloud provider, but we also keep the ability to create a provider-specific binary without forking, using build tags.
There was a problem hiding this comment.
I added a router package that would consolidate all the blank imports and allow for tag based builds.
I couldn't put init() func there as those import the cloudproviders itself.
Forks are advised to use the desired cloudprovider package directly, bypassing the router.
2edd8ab to
7f0872d
Compare
This comprehensive refactoring transitions the Cluster Autoscaler's cloud provider initialization from a hardcoded, monolithic switch statement to a decoupled, dynamic registration pattern. Motivation: Previously, the core cloud provider builder maintained direct dependencies on every supported cloud provider implementation. This architectural coupling forced the Cluster Autoscaler to pull in a massive number of transitive dependencies for all providers (AWS, Azure, GCE, etc.) regardless of which one was actually used. This resulted in significant "dependency bloat," unnecessarily large binary sizes, and long build times. It was particularly burdensome for external forks or specialized deployments that only required a single provider. Key Architectural Improvements: 1. Centralized Registration: The cloud provider initialization logic is now centralized in the `cloudprovider/builder` package. Each cloud provider implementation is responsible for registering its own builder function via an `init()` block. 2. True Decoupling: The core builder no longer has any direct knowledge or compile-time dependencies on specific provider implementations. It interacts solely with a registry of builder functions. 3. Dynamic Provider Discovery: `AvailableCloudProviders()` is now a dynamic function that returns only the providers that have been registered in the current binary. This ensures that CLI help text (`--help`) and flag validation accurately reflect the capabilities of the specific build. 4. Configurable Default Provider: The `DefaultCloudProvider` is no longer a hardcoded constant. It can now be set dynamically via the registry, allowing custom builds to define their own default provider without modifying core code. 5. Modular Build Support: The set of supported providers in a binary is now entirely controlled by blank imports in cloudprovider/router. While standard builds continue to include all providers, this pattern enables the easy creation of optimized, provider-specific binaries. Impact: This change significantly improves the maintainability and extensibility of the Cluster Autoscaler. It paves the way for a more modular architecture where cloud providers can be treated as optional plugins, reducing the core's dependency footprint and making it easier for the community to contribute and maintain provider-specific logic.
7f0872d to
b97accd
Compare
|
/retest |
|
Looks good to me, @towca for approval. |
| limitations under the License. | ||
| */ | ||
|
|
||
| // Package router provides a centralized way to include cloud provider implementations |
There was a problem hiding this comment.
nit: IMO a README.md would me more discoverable than a go file buried in a directory of 30 go files. IDEs would also show if by default when you click the directory etc.
| // Cloud providers must be explicitly imported to be registered in the builder. | ||
| // The registration pattern allows for customizing the set of supported cloud providers | ||
| // by including or excluding these blank imports. This is particularly useful for | ||
| // external forks that want to avoid unnecessary dependencies. |
There was a problem hiding this comment.
nit: I'd mention the provider-specific tags somehow in this comment.
| } | ||
|
|
||
| // SetDefaultCloudProvider sets the default cloud provider name. | ||
| func SetDefaultCloudProvider(name string) { |
There was a problem hiding this comment.
It seems like we lost setting the default provider in the flow with provider-specific tags. It seems like we could restore it pretty easily:
- In the
init()functions incloudprovider/<provider_name>, add aSetDefaultCloudProvider()call. This preserves the previous behavior of not having to pass the cloud provider flag if you build with a provider-specific tag. - In
router_all.go, add aninit()function that callsSetDefaultCloudProvider(gce)to preserve the GCE default when you build with no tags.
WDYT?
What type of PR is this?
/kind cleanup
What this PR does / why we need it:
This comprehensive refactoring transitions the Cluster Autoscaler's cloud provider initialization from a hardcoded, monolithic switch statement to a decoupled, dynamic registration pattern.
Motivation:
Previously, the core cloud provider builder maintained direct dependencies on every supported cloud provider implementation. This architectural coupling forced the Cluster Autoscaler to pull in a massive number of transitive dependencies for all providers (AWS, Azure, GCE, etc.) regardless of which one was actually used. This resulted in significant "dependency bloat," unnecessarily large binary sizes, and long build times. It was particularly burdensome for external forks or specialized deployments that only required a single provider.
Impact:
This change significantly improves the maintainability and extensibility of the Cluster Autoscaler. It paves the way for a more modular architecture where cloud providers can be treated as optional plugins, reducing the core's dependency footprint and making it easier for the community to contribute and maintain provider-specific logic.
One of the most important impacts of this refactor is for the health of Cluster Autoscaler forks. Previously, any project importing k8s.io/autoscaler/cluster-autoscaler was forced to inherit the massive transitive dependency tree of every single cloud provider (AWS, Azure, etc.) in their go.mod. With this change, we can finally import the core CA packages directly without that bloat. For example, in our GCE CA fork, we can now keep a clean go.mod that only contains the dependencies we actually use. This decoupling is a huge win for the maintainability of all external distributions and significantly reduces our exposure to upstream dependency issues
I believe this invasive refactoring is essential for the health of the project and it would be beneficial to all maintainers of CA forks. Also it reduces the attack surface and makes the derivative projects more resilient, given the rising risk and popularity of supply chain attacks.
Which issue(s) this PR fixes:
Fixes #
Special notes for your reviewer:
Key Architectural Improvements:
Centralized Registration: The cloud provider initialization logic is now centralized in the
cloudprovider/builderpackage. Each cloud provider implementation is responsible for registering its own builder function via aninit()block.True Decoupling: The core builder no longer has any direct knowledge or compile-time dependencies on specific provider implementations. It interacts solely with a registry of builder functions.
Dynamic Provider Discovery:
AvailableCloudProviders()is now a dynamic function that returns only the providers that have been registered in the current binary. This ensures that CLI help text (--help) and flag validation accurately reflect the capabilities of the specific build.Configurable Default Provider: The
DefaultCloudProvideris no longer a hardcoded constant. It can now be set dynamically via the registry, allowing custom builds to define their own default provider without modifying core code.Modular Build Support: The set of supported providers in a binary is now entirely controlled by blank imports (e.g., in
main.go). While standard builds continue to include all providers, this pattern enables the easy creation of optimized, provider-specific binaries.Does this PR introduce a user-facing change?
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.: