Skip to content

WIP: Minimal refactoring to have a single shard#2658

Open
shmuelk wants to merge 1 commit intokubernetes-sigs:mainfrom
shmuelk:fc-refactor-1
Open

WIP: Minimal refactoring to have a single shard#2658
shmuelk wants to merge 1 commit intokubernetes-sigs:mainfrom
shmuelk:fc-refactor-1

Conversation

@shmuelk
Copy link
Copy Markdown
Contributor

@shmuelk shmuelk commented Mar 22, 2026

What type of PR is this?
/kind cleanup

What this PR does / why we need it:
The Flow Control component is a critical component in the Endpoint Picker (EPP), enabling it to throttle workloads thus preventing over committing Model Server resources.

Issue #2628 was created to describe a set of simplifications to the Flow Control layer. This PR is the first in a series to implement issue #2628.

In particular this PR changes the Flow Control layer to only have a single shard.

Which issue(s) this PR fixes:
Refs #2628

Does this PR introduce a user-facing change?:

NONE

@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. labels Mar 22, 2026
@netlify
Copy link
Copy Markdown

netlify bot commented Mar 22, 2026

Deploy Preview for gateway-api-inference-extension ready!

Name Link
🔨 Latest commit cf2de59
🔍 Latest deploy log https://app.netlify.com/projects/gateway-api-inference-extension/deploys/69ca82e40e285b000708c8f7
😎 Deploy Preview https://deploy-preview-2658--gateway-api-inference-extension.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Mar 22, 2026
@k8s-ci-robot k8s-ci-robot added the size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. label Mar 22, 2026
@shmuelk shmuelk changed the title WIP: Minimal refacctoring to have a single shard WIP: Minimal refactoring to have a single shard Mar 22, 2026
@Gregory-Pereira
Copy link
Copy Markdown
Member

cc @RishabhSaini

@nirrozenbaum
Copy link
Copy Markdown
Contributor

/cc @LukeAVanDrie

Copy link
Copy Markdown
Contributor

@LukeAVanDrie LukeAVanDrie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @shmuelk! This LGTM. I noticed one place we can simplify the new createShard method, else I only have a few nits.

allShards []*registryShard // Cached, sorted combination of Active and Draining shards
nextShardID uint64
mu sync.RWMutex
shard *registryShard
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: This is currently read in ActiveShards() and ShardStats() without acquiring fr.mu.RLock(). With dynamic sharding removed, fr.shard is initialized once and never mutated, making these lock-free reads completely safe from data races.

Could we move the shard *registryShard field of the "Administrative state (protected by mu)" block and up to the "Immutable dependencies" block?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will do this in the next PR

}

// createShard creates the shard.
func (fr *FlowRegistry) createShard() error {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The entire block of code in here that iterates over fr.flowStates.Range(...) to build allComponents and synchronizeFlow is actually dead code.

Because createShard() is exclusively called during NewFlowRegistry() before the EPP accepts any connections, fr.flowStates is guaranteed to be empty.

You can simplify this initialization method to just:

func (fr *FlowRegistry) createShard() error {
	fr.mu.Lock()
	defer fr.mu.Unlock()
	partitionedConfig := fr.config.partition(0, 1)
	fr.shard = newShard("shard-0", partitionedConfig, fr.logger, fr.propagateStatsDelta)
	return nil
}


// repartitionShardConfigsLocked updates the configuration for all active shards.
// Expects the registry's write lock to be held.
func (fr *FlowRegistry) repartitionShardConfigsLocked() {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note for other reviewers... This looks weird considering a single-shard view, but we must preserve it for now. When ensurePriorityBand dynamically creates a new band, this partition path acts as a deep-copy mechanism to push the mutated registry config down to the isolated shard state.

In a follow-up PR (if/when we eliminate the boundary between registryShard and FlowRegistry entirely), we can have everything reference a single unified Config, allowing us to drop this path entirely.

defer fr.mu.RUnlock()

components, err := fr.buildFlowComponents(key, len(fr.allShards))
components, err := fr.buildFlowComponents(key, 1)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Consider updating buildFlowComponents to drop the numInstances arg and just return a single (flowComponents, error) tuple. Fine if we want to defer this to a different PR though.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will do this in the next PR

@LukeAVanDrie
Copy link
Copy Markdown
Contributor

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 23, 2026
for i, s := range c.registry.activeShards {
shardsCopy[i] = s
}
shardsCopy := make([]contracts.RegistryShard, 1)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we going to remove the concept of a shard in a later PR?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, there will be no "data-parallel" concept anymore. See the attached issue for more context.

For these PRs though, as long as the FC layer has 0 regressions between revisions, I want to get these in even if there are some minor stylistic/semantic improvements that could be made. This code should look significantly different after these are all in, so I think it is most expedient to focus on polish at the end of the refactoring effort rather than at each intermediary step.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's totally fine with me, just making sure we are all pointed in the same direction

@kfswain
Copy link
Copy Markdown
Collaborator

kfswain commented Mar 23, 2026

/approve

@k8s-ci-robot
Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: kfswain, LukeAVanDrie, shmuelk

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 23, 2026
@shmuelk
Copy link
Copy Markdown
Contributor Author

shmuelk commented Mar 24, 2026

@LukeAVanDrie and @kfswain thank you for the reviews.

I would remove the WIP (i.e. hold) except that in some attempts to compare performance, I see some strange results. I'm trying to get to the bottom of it.

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 29, 2026
@ahg-g
Copy link
Copy Markdown
Contributor

ahg-g commented Mar 30, 2026

rebase pls?

@ahg-g
Copy link
Copy Markdown
Contributor

ahg-g commented Mar 30, 2026

I would remove the WIP (i.e. hold) except that in some attempts to compare performance, I see some strange results. I'm trying to get to the bottom of it.

Still seeing a regression in perf?

@shmuelk
Copy link
Copy Markdown
Contributor Author

shmuelk commented Mar 30, 2026

I am running a slightly modified version of Like's benchmark (only one shard only up to 5000 Items in the queue)

The original script I saw in Luke's PR ran the tests each for one sec and repeated five times.

This run showed some regressions in cases where there were 5000 workers. 5000 goroutines didin't start well in one second.

I have since run the tests each for five seconds and again repeated five times. The results are much better, but there are spikes.

Here is an Excel spreadsheet of my results:
baseline-base-step1-compare.xlsx

The baseline numbers are an average of four runs of the benchmark script. The step one (the code from this PR) numbers are from eight runs of the benchmark script. The ns/op and d/s numbers are averaged over the tests in question. I have added median and max values for the tests in the step one numbers.

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>
@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 30, 2026
@k8s-ci-robot
Copy link
Copy Markdown
Contributor

New changes are detected. LGTM label has been removed.

@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 30, 2026
@LukeAVanDrie
Copy link
Copy Markdown
Contributor

This run showed some regressions in cases where there were 5000 workers. 5000 goroutines didin't start well in one second.

I have since run the tests each for five seconds and again repeated five times. The results are much better, but there are spikes.

The test amortizes the first-time flow / priority band instantiation across the bench time, so running it for longer durations (5s) vs (1s) is actually preferable to smooth that out and look at steady-state performance under load.

@shmuelk
Copy link
Copy Markdown
Contributor Author

shmuelk commented Mar 31, 2026

I have run my performance benchmarks on my laptop which is:

Chip: Apple M4 Max
Total Number of Cores: 16 (12 performance and 4 efficiency)
Memory: 64 GB

The tests above used all sixteen cores, which apparently are not all equal.

I have run an additional set of tests using only twelve cores, in an attempt to not use the "efficiency cores".

Here are the results:
baseline-base-step1-compare.xlsx

@k8s-ci-robot
Copy link
Copy Markdown
Contributor

PR needs rebase.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 1, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants