Fix ResourceWatcher Data Race and Redis Connection Leaks#1741
Open
chungeun-choi wants to merge 2 commits intoOT-CONTAINER-KIT:mainfrom
Open
Fix ResourceWatcher Data Race and Redis Connection Leaks#1741chungeun-choi wants to merge 2 commits intoOT-CONTAINER-KIT:mainfrom
chungeun-choi wants to merge 2 commits intoOT-CONTAINER-KIT:mainfrom
Conversation
This patch dramatically improves the throughput of the Redis operator during large-scale provisioning contexts when MAX_CONCURRENT_RECONCILES > 1. 1. Fix controllerutil ResourceWatcher concurrent safety (Data Race fix) 2. Wrap GetRedisNodesByRole defer logic in func to prevent TCP connection leak 3. Consolidate reconcileRedis and reconcileStatus to avoid redundant topology calls Signed-off-by: chungeun-choi <cucuridas@gmail.com>
Significantly increase the default rate limits (QPS) for kube client to prevent aggressive client-side throttling and delays during scale-out events. Signed-off-by: chungeun-choi <cucuridas@gmail.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
This PR severely boosts the operator’s concurrent throughput and fixes internal blocking bottlenecks when orchestrating multiple
RedisReplicationresources efficiently.The detailed changes include:
ResourceWatcherThread-Safety: Replaced the value receiver with a pointer receiver(w *ResourceWatcher)and implementedsync.RWMutexto protect thewatchedmap against Data Races whenMAX_CONCURRENT_RECONCILES > 1.GetRedisNodesByRoleto wrap theconfigureRedisReplicationClientanddefer redisClient.Close()execution in an anonymous function. This ensures stale connections close immediately per iteration rather than hogging connections inside theforloop until return.redisreplication_controller.goto mergereconcileRedisandreconcileStatusinto a singlereconcileRedisAndStatusfunction yielding a ~50% reduction in concurrent TCP handshakes per reconcile step.Fixes #ISSUE
Type of change
Checklist
Additional Context
In a local constrained environment (Docker Desktop) orchestrating 30
RedisReplicationclusters simultaneously:go routineleaks or panic logs.