Skip to content

Cilium node affinity has changed since 1.33 to 1.34 #18099

@andrisrovw

Description

@andrisrovw

/kind bug

1. What kops version are you running? The command kops version, will display
this information.

1.33 and 1.34

2. What Kubernetes version are you running? kubectl version will print the
version if a cluster is running or provide the Kubernetes version specified as
a kops flag.

1.33.9 and 1.34.5

3. What cloud provider are you using?
AWS

4. What commands did you run? What is the simplest way to reproduce this issue?

Install kops 1.33 on AWS and upgrade to 1.34

5. What happened after the commands executed?

In kops version 1.33 cilium operator had nodeAffinity to also can be placed on the master nodes.

spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: node-role.kubernetes.io/control-plane
operator: Exists
- matchExpressions:
- key: node-role.kubernetes.io/master
operator: Exists

With upgrading to version 1.34 , the cilium operator does not have affinities anymore which results in the operator spawning also on worker nodes.

As the master & worker nodes do have different permissions, this is resulting in cilium errors due to the lack of ec2 permissions.

It should be possible to either

  1. not change the behaviour of the cilium operator and still define sand use nodeAffinity to master nodes
  2. be able to configure in cluster.yaml to configure own nodeSelectors / nodeAfifinites

6. What did you expect to happen?

I've expected that with the upgrade the cilium operator did not change it's behaviour and still be placed on master ndoes.

7. Please provide your cluster manifest. Execute
kops get --name my.example.com -o yaml to display your cluster manifest.
You may want to remove your cluster name and other sensitive information.

apiVersion: kops.k8s.io/v1alpha2
kind: Cluster
metadata:
  name: <replaced>
spec:
  additionalPolicies:
    node: |
      [
        {
          "Effect": "Allow",
          "Resource": [ "*" ],
          "Action": [
            "ecr:DescribeImages",
            "ecr:DescribeImageScanFindings",
            "inspector2:ListFindings",
            "inspector2:ListCoverage",
            "autoscaling:DescribeAutoScalingGroups",
            "autoscaling:DescribeAutoScalingInstances",
            "autoscaling:DescribeLaunchConfigurations",
            "autoscaling:DescribeTags",
            "autoscaling:SetDesiredCapacity",
            "autoscaling:TerminateInstanceInAutoScalingGroup"
          ]
        }
      ]
  addons:
  - manifest: |
      apiVersion: rbac.authorization.k8s.io/v1
      kind: ClusterRole
      metadata:
        name: cilium-secrets-access
      rules:
      - apiGroups: [""]
        resources: ["secrets"]
        verbs: ["get", "list", "watch"]
      ---
      apiVersion: rbac.authorization.k8s.io/v1
      kind: ClusterRoleBinding
      metadata:
        name: cilium-secrets-access
      roleRef:
        apiGroup: rbac.authorization.k8s.io
        kind: ClusterRole
        name: cilium-secrets-access
      subjects:
      - kind: ServiceAccount
        name: cilium
        namespace: kube-system
  api:
    loadBalancer:
      class: Network
      crossZoneLoadBalancing: true
      type: Public
  assets:
    containerProxy: <replaced>
  authentication: {}
  authorization:
    rbac: {}
  certManager:
    defaultIssuer: dns01-prod
    enabled: true
    hostedZoneIDs:
    - Z0630207G53FYFMI1NJH
    - Z086094812389094BOH1X
  channel: stable
  cloudConfig:
    awsEBSCSIDriver:
      volumeAttachLimit: 23
  cloudControllerManager:
    cpuRequest: 75m
  cloudLabels:
    ApplicationId: <replaced>
    Creator: <replaced>-cluster-kops
    Environment: dev
    Owner: <replaced>
    Permanent: "true"
    Project: <replaced>
    zone: <replaced>
  cloudProvider: aws
  clusterAutoscaler:
    createPriorityExpanderConfig: false
  configBase: <replaced>
  containerd:
    installCriCtl: true
  etcdClusters:
  - etcdMembers:
    - encryptedVolume: true
      instanceGroup: master-eu-north-1a
      name: a
    - encryptedVolume: true
      instanceGroup: master-eu-north-1b
      name: b
    - encryptedVolume: true
      instanceGroup: master-eu-north-1c
      name: c
    manager:
      env:
      - name: UMASK
        value: "0027"
    name: main
  - etcdMembers:
    - encryptedVolume: true
      instanceGroup: master-eu-north-1a
      name: a
    - encryptedVolume: true
      instanceGroup: master-eu-north-1b
      name: b
    - encryptedVolume: true
      instanceGroup: master-eu-north-1c
      name: c
    manager:
      env:
      - name: UMASK
        value: "0027"
    name: events
  externalPolicies:
    master:
    - arn:aws:iam::aws:policy/AmazonSSMManagedInstanceCore
    node:
    - arn:aws:iam::aws:policy/AmazonSSMManagedInstanceCore
  fileAssets:
  - content: |
      apiVersion: audit.k8s.io/v1 # This is required.
      kind: Policy
      # Don't generate audit events for all requests in RequestReceived stage.
      omitStages:
        - "RequestReceived"
      rules:
        # Log pod changes at RequestResponse level
        - level: RequestResponse
          resources:
          - group: ""
            # Resource "pods" doesn't match requests to any subresource of pods,
            # which is consistent with the RBAC policy.
            resources: ["pods"]
        # Log "pods/log", "pods/status" at Metadata level
        - level: Metadata
          resources:
          - group: ""
            resources: ["pods/log", "pods/status"]

        # Don't log requests to a configmap called "controller-leader"
        - level: None
          resources:
          - group: ""
            resources: ["configmaps"]
            resourceNames: ["controller-leader"]

        # Don't log watch requests by the "system:kube-proxy" on endpoints or services
        - level: None
          users: ["system:kube-proxy"]
          verbs: ["watch"]
          resources:
          - group: "" # core API group
            resources: ["endpoints", "services"]

        # Don't log authenticated requests to certain non-resource URL paths.
        - level: None
          userGroups: ["system:authenticated"]
          nonResourceURLs:
          - "/api*" # Wildcard matching.
          - "/version"

        # Log the request body of configmap changes in kube-system.
        - level: Request
          resources:
          - group: "" # core API group
            resources: ["configmaps"]
          # This rule only applies to resources in the "kube-system" namespace.
          # The empty string "" can be used to select non-namespaced resources.
          namespaces: ["kube-system"]

        # Log configmap and secret changes in all other namespaces at the Metadata level.
        - level: Metadata
          resources:
          - group: "" # core API group
            resources: ["secrets", "configmaps"]

        # Log all other resources in core and extensions at the Request level.
        - level: Request
          resources:
          - group: "" # core API group
          - group: "extensions" # Version of group should NOT be included.

        # A catch-all rule to log all other requests at the Metadata level.
        - level: Metadata
          # Long-running requests like watches that fall under this rule will not
          # generate an audit event in RequestReceived.
          omitStages:
            - "RequestReceived"
    mode: "0544"
    name: audit-policy-config
    path: /etc/kubernetes/audit/policy-config.yaml
    roles:
    - ControlPlane
  - content: |
      apiVersion: apiserver.config.k8s.io/v1
      kind: AdmissionConfiguration
      plugins:
        - name: EventRateLimit
          configuration:
            apiVersion: eventratelimit.admission.k8s.io/v1alpha1
            kind: Configuration
            limits:
              - type: Namespace
                qps: 50
                burst: 100
                cacheSize: 2000
              - type: User
                qps: 10
                burst: 50
    mode: "0544"
    name: eventratelimit-config
    path: /srv/kubernetes/kube-apiserver/admission-control.yaml
    roles:
    - ControlPlane
  iam:
    allowContainerRegistry: true
    legacy: false
    serviceAccountExternalPermissions: <replaced>
    useServiceAccountExternalPermissions: true
  karpenter: {}
  kubeAPIServer:
    admissionControlConfigFile: /srv/kubernetes/kube-apiserver/admission-control.yaml
    auditLogMaxAge: 30
    auditLogMaxBackups: 10
    auditLogMaxSize: 100
    auditLogPath: /var/log/kube-apiserver-audit.log
    auditPolicyFile: /etc/kubernetes/audit/policy-config.yaml
    enableAdmissionPlugins:
    - DefaultStorageClass
    - DefaultTolerationSeconds
    - LimitRanger
    - MutatingAdmissionWebhook
    - NamespaceLifecycle
    - NodeRestriction
    - ResourceQuota
    - RuntimeClass
    - ServiceAccount
    - ValidatingAdmissionPolicy
    - ValidatingAdmissionWebhook
    - AlwaysPullImages
    - EventRateLimit
    enableProfiling: false
    oidcClientID: <replaced>
    oidcGroupsClaim: <replaced>
    oidcIssuerURL: <replaced>
    oidcUsernameClaim: <replaced>
  kubeControllerManager:
    enableProfiling: false
    terminatedPodGCThreshold: 10
  kubeDNS:
    externalCoreFile: |-
      .:53 {
          errors
          health {
            lameduck 5s
          }
          ready
          kubernetes cluster.local. in-addr.arpa ip6.arpa {
            pods insecure
            fallthrough in-addr.arpa ip6.arpa
            ttl 30
          }
          prometheus :9153
          forward . /etc/resolv.conf {
            max_concurrent 1000
          }
          cache 30
          loop
          reload
          loadbalance
          # Rewrites Microservices
          <replaced>
      }
    provider: CoreDNS
  kubeScheduler:
    enableProfiling: false
  kubelet:
    anonymousAuth: false
    authenticationTokenWebhook: true
    authorizationMode: Webhook
    eventQPS: 0
    evictionHard: memory.available<1000Mi,nodefs.available<10%,nodefs.inodesFree<5%,imagefs.available<10%,imagefs.inodesFree<5%
    housekeepingInterval: 1s
    maxPods: 45
    resolvConf: /run/systemd/resolve/resolv.conf
  kubernetesApiAccess:
  <replaced>
  kubernetesVersion: 1.34.5
  masterPublicName: <replaced>
  networkCIDR: <replaced>
  networking:
    cilium:
      enableBPFMasquerade: false
      enablePrometheusMetrics: true
      hubble:
        enabled: true
        metrics:
        - httpV2:labelsContext=source_ip,source_namespace,source_workload,destination_ip,destination_namespace,destination_workload,traffic_direction;sourceContext=workload-name|reserved-identity;destinationContext=workload-name|reserved-identity
      ipam: eni
  nodeTerminationHandler:
    enabled: false
  nonMasqueradeCIDR: <replaced>
  ntp:
    managed: false
  podIdentityWebhook:
    enabled: true
  serviceAccountIssuerDiscovery:
    discoveryStore: <replaced>
    enableAWSOIDCProvider: true
  subnets:
    <replaced>
  topology:
    dns:
      type: Public

8. Please run the commands with most verbose logging by adding the -v 10 flag.
Paste the logs into this report, or in a gist and provide the gist link here.

9. Anything else do we need to know?

Metadata

Metadata

Assignees

No one assigned

    Labels

    kind/bugCategorizes issue or PR as related to a bug.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions