Skip to content

Kong Ingress Controller (DB-less) not converging state across pods – one instance missing routes leading to intermittent 404sΒ #14905

Description

@akshaysharama

πŸ”Ž Summary

In a DB-less Kong deployment with multiple replicas, one Kong instance intermittently serves 404 responses due to incomplete route configuration, while other instances function correctly.

This results in inconsistent routing behavior depending on which pod receives the request.


πŸ“¦ Environment

  • Kong Version: 3.3
  • Kong Ingress Controller Version: 2.11
  • Kubernetes Distribution: k3s
  • Deployment Mode: DB-less (KONG_DATABASE=off)
  • Replicas: Multiple (behind LoadBalancer)

Controller Resources:

requests:
  cpu: 25m
  memory: 100Mi
limits:
  cpu: 100m
  memory: 250Mi

🚨 Observed Behavior

  • Requests via LoadBalancer intermittently return:

    • βœ… 200 OK (healthy pods)
    • ❌ 404 Not Found (faulty pod)
  • Kong Admin API comparison across pods:

    curl -sk https://localhost:8001/routes | jq '.data | length'
    • Healthy pods β†’ full route set
    • Faulty pod β†’ partial routes (e.g., only TCPIngress)
  • Faulty pod:

    • Remains in partial state indefinitely
    • Does not self-recover
    • Appears Ready from Kubernetes perspective

πŸ§ͺ Steps to Reproduce (approximate)

  1. Deploy Kong (DB-less) with KIC v2.11 and multiple replicas
  2. Apply multiple Ingress and TCPIngress resources
  3. Send traffic via LoadBalancer
  4. Observe intermittent 404 responses
  5. Compare /routes across pods β†’ one pod shows missing routes

πŸ“Š Expected Behavior

All Kong instances should independently converge to the same configuration derived from Kubernetes resources, resulting in consistent routing across all pods.


πŸ” Additional Observations

  • No critical errors in logs (log level = warn)
  • Occasional network timeout logs (unclear if related)
  • Only partial configuration present on affected pod (TCPIngress present, HTTP Ingress missing)
  • https://artifacthub.io/packages/helm/kong/kong/2.27.0 (helm chart we are using to deploy)

πŸ’₯ Impact

  • Intermittent production 404 errors
  • Non-deterministic routing behavior
  • Difficult to detect via health checks (pods remain Ready)

πŸ”§ Workarounds

  • Restarting affected pod β†’ resolves issue temporarily

πŸ™ Any guidance or pointers would be appreciated.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions