fix: Batch CLUSTER ADDSLOTS for single-leader RedisCluster to avoid exec URL limit by svils · Pull Request #1706 · OT-CONTAINER-KIT/redis-operator

svils · 2026-03-09T14:25:07Z

Summary

CreateSingleLeaderRedisCommand built a single redis-cli CLUSTER ADDSLOTS 0 1 2 ... 16383 command with all 16384 slot numbers as individual arguments. When executed via the Kubernetes pod exec API (SPDY), the arguments are encoded as URL query parameters, exceeding the URL length limit. The connection upgrade fails and the cluster stays stuck in Bootstrap forever.

Redis 7+ (default): use CLUSTER ADDSLOTSRANGE 0 16383 — a single compact command with just a start-end range pair, avoiding the URL length issue entirely
Redis <7 fallback: batch CLUSTER ADDSLOTS into chunks of 1000 per exec call. CLUSTER ADDSLOTS is idempotent for unassigned slots, so partial retries on the next reconcile are safe
Auth (-a) and TLS flags are handled per-call since the single-leader path now executes independently
The default: (multi-leader) path is unchanged in behavior

Test plan

New unit test TestSingleLeaderAddSlotsBatching validates all 16384 slots are covered across 17 batches with no gaps or overlaps (v6 fallback path)
All existing tests pass (go test ./internal/k8sutils/...)
Deploy a RedisCluster with clusterSize: 1 on Redis 7, verify it bootstraps with ADDSLOTSRANGE
Deploy a RedisCluster with clusterSize: 3, verify multi-leader bootstrap is unaffected

…xec URL limit CreateSingleLeaderRedisCommand built a single redis-cli CLUSTER ADDSLOTS command with all 16384 slot numbers as arguments. This exceeds the Kubernetes pod exec SPDY URL length limit, causing the connection upgrade to fail and the cluster to stay stuck in Bootstrap forever. Replace with executeSingleLeaderAddSlots: - Redis 7+ (default): use CLUSTER ADDSLOTSRANGE 0 16383 — a single compact command that takes a start-end range pair - Redis <7 fallback: batch CLUSTER ADDSLOTS into chunks of 1000 per exec call to stay within the URL length limit Fixes OT-CONTAINER-KIT#1704 Signed-off-by: svils <63684363+svils@users.noreply.github.com>

codecov · 2026-03-09T15:37:13Z

Codecov Report

❌ Patch coverage is 0% with 42 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (main@6864c09). Learn more about missing BASE report.

Files with missing lines	Patch %	Lines
internal/k8sutils/redis.go	0.00%	42 Missing ⚠️

Additional details and impacted files

@@           Coverage Diff           @@
##             main    #1706   +/-   ##
=======================================
  Coverage        ?   29.88%           
=======================================
  Files           ?       83           
  Lines           ?     6743           
  Branches        ?        0           
=======================================
  Hits            ?     2015           
  Misses          ?     4532           
  Partials        ?      196

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

…g infinite Failed state after pod restarts Broaden repairDisconnectedMasters to handle both failed masters and failed slaves (renamed to RepairDisconnectedNodes). For slaves, issue CLUSTER MEET to fix gossip and CLUSTER REPLICATE to re-establish replication so the follower resolves its master's current IP. Add RepairStaleReplication to detect connected followers whose master_link_status is down and re-issue CLUSTER REPLICATE. Harden cluster config defaults: - tcp-keepalive 60s (faster dead connection detection) - cluster-node-timeout 15000ms (more gossip recovery time, configurable) - cluster-allow-reads-when-down yes (unblock clients during repair) - probe TimeoutSeconds=5 / FailureThreshold=5 (prevent premature pod eviction) Fixes OT-CONTAINER-KIT#1692 Signed-off-by: svils <63684363+svils@users.noreply.github.com>

… is not empty" loop ExecuteRedisReplicationCommand uses `redis-cli --cluster add-node` to join followers to the cluster. This command requires the target node to be completely empty — no cluster state and no keys in database 0. After a leader-only CLUSTER RESET during single-node bootstrap (via executeFailoverCommand), the follower retains its previous cluster state and replicated data. The operator then enters an infinite error loop: [ERR] Node <cluster>-follower-0...is not empty. Either the node already knows other nodes (check with CLUSTER NODES) or contains some key in database 0. Add resetFollowerIfNotEmpty: before running add-node, check whether the follower knows other nodes or has keys in db0. If so, issue CLUSTER RESET HARD to clear both cluster state and data so add-node can succeed. The follower will re-sync from the master via full sync after joining. Related: OT-CONTAINER-KIT#1407 Signed-off-by: svils <63684363+svils@users.noreply.github.com>

…er-addslots-batch

svils requested review from drivebyer, iamabhishek-dubey and shubham-cmyk as code owners March 9, 2026 14:25

svils force-pushed the fix/single-leader-addslots-batch branch from 33be324 to 9dac93a Compare March 9, 2026 14:28

svils force-pushed the fix/single-leader-addslots-batch branch from 9dac93a to 1d10833 Compare March 9, 2026 14:31

svils changed the title ~~fix: Batch CLUSTER ADDSLOTS for single-leader RedisCluster~~ fix: Batch CLUSTER ADDSLOTS for single-leader RedisCluster to avoid exec URL limit Mar 9, 2026

svils and others added 5 commits March 9, 2026 17:46

Merge branch 'main' into fix/repair-disconnected-followers

7aebfa9

Merge branch 'main' into fix/single-leader-addslots-batch

262341b

Merge branch 'fix/replication-addnode-not-empty' into fix/single-lead…

6a46cd5

…er-addslots-batch

svils mentioned this pull request Apr 7, 2026

fix: RepairDisconnectedMasters does not heal failed followers, causing infinite Failed state after pod restarts #1705

Open

7 tasks

Merge branch 'fix/repair-disconnected-followers' into fix/single-lead…

cbec96a

…er-addslots-batch

svils force-pushed the fix/single-leader-addslots-batch branch from 6f7025e to cbec96a Compare April 9, 2026 12:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: Batch CLUSTER ADDSLOTS for single-leader RedisCluster to avoid exec URL limit#1706

fix: Batch CLUSTER ADDSLOTS for single-leader RedisCluster to avoid exec URL limit#1706
svils wants to merge 7 commits intoOT-CONTAINER-KIT:mainfrom
svils:fix/single-leader-addslots-batch

svils commented Mar 9, 2026 •

edited

Loading

Uh oh!

codecov Bot commented Mar 9, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

svils commented Mar 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Uh oh!

codecov Bot commented Mar 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

svils commented Mar 9, 2026 •

edited

Loading

codecov Bot commented Mar 9, 2026 •

edited

Loading