-
Notifications
You must be signed in to change notification settings - Fork 4.7k
Cilium node affinity has changed since 1.33 to 1.34 #18099
Description
/kind bug
1. What kops version are you running? The command kops version, will display
this information.
1.33 and 1.34
2. What Kubernetes version are you running? kubectl version will print the
version if a cluster is running or provide the Kubernetes version specified as
a kops flag.
1.33.9 and 1.34.5
3. What cloud provider are you using?
AWS
4. What commands did you run? What is the simplest way to reproduce this issue?
Install kops 1.33 on AWS and upgrade to 1.34
5. What happened after the commands executed?
In kops version 1.33 cilium operator had nodeAffinity to also can be placed on the master nodes.
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: node-role.kubernetes.io/control-plane
operator: Exists
- matchExpressions:
- key: node-role.kubernetes.io/master
operator: Exists
With upgrading to version 1.34 , the cilium operator does not have affinities anymore which results in the operator spawning also on worker nodes.
As the master & worker nodes do have different permissions, this is resulting in cilium errors due to the lack of ec2 permissions.
It should be possible to either
- not change the behaviour of the cilium operator and still define sand use nodeAffinity to master nodes
- be able to configure in cluster.yaml to configure own nodeSelectors / nodeAfifinites
6. What did you expect to happen?
I've expected that with the upgrade the cilium operator did not change it's behaviour and still be placed on master ndoes.
7. Please provide your cluster manifest. Execute
kops get --name my.example.com -o yaml to display your cluster manifest.
You may want to remove your cluster name and other sensitive information.
apiVersion: kops.k8s.io/v1alpha2
kind: Cluster
metadata:
name: <replaced>
spec:
additionalPolicies:
node: |
[
{
"Effect": "Allow",
"Resource": [ "*" ],
"Action": [
"ecr:DescribeImages",
"ecr:DescribeImageScanFindings",
"inspector2:ListFindings",
"inspector2:ListCoverage",
"autoscaling:DescribeAutoScalingGroups",
"autoscaling:DescribeAutoScalingInstances",
"autoscaling:DescribeLaunchConfigurations",
"autoscaling:DescribeTags",
"autoscaling:SetDesiredCapacity",
"autoscaling:TerminateInstanceInAutoScalingGroup"
]
}
]
addons:
- manifest: |
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: cilium-secrets-access
rules:
- apiGroups: [""]
resources: ["secrets"]
verbs: ["get", "list", "watch"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: cilium-secrets-access
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: cilium-secrets-access
subjects:
- kind: ServiceAccount
name: cilium
namespace: kube-system
api:
loadBalancer:
class: Network
crossZoneLoadBalancing: true
type: Public
assets:
containerProxy: <replaced>
authentication: {}
authorization:
rbac: {}
certManager:
defaultIssuer: dns01-prod
enabled: true
hostedZoneIDs:
- Z0630207G53FYFMI1NJH
- Z086094812389094BOH1X
channel: stable
cloudConfig:
awsEBSCSIDriver:
volumeAttachLimit: 23
cloudControllerManager:
cpuRequest: 75m
cloudLabels:
ApplicationId: <replaced>
Creator: <replaced>-cluster-kops
Environment: dev
Owner: <replaced>
Permanent: "true"
Project: <replaced>
zone: <replaced>
cloudProvider: aws
clusterAutoscaler:
createPriorityExpanderConfig: false
configBase: <replaced>
containerd:
installCriCtl: true
etcdClusters:
- etcdMembers:
- encryptedVolume: true
instanceGroup: master-eu-north-1a
name: a
- encryptedVolume: true
instanceGroup: master-eu-north-1b
name: b
- encryptedVolume: true
instanceGroup: master-eu-north-1c
name: c
manager:
env:
- name: UMASK
value: "0027"
name: main
- etcdMembers:
- encryptedVolume: true
instanceGroup: master-eu-north-1a
name: a
- encryptedVolume: true
instanceGroup: master-eu-north-1b
name: b
- encryptedVolume: true
instanceGroup: master-eu-north-1c
name: c
manager:
env:
- name: UMASK
value: "0027"
name: events
externalPolicies:
master:
- arn:aws:iam::aws:policy/AmazonSSMManagedInstanceCore
node:
- arn:aws:iam::aws:policy/AmazonSSMManagedInstanceCore
fileAssets:
- content: |
apiVersion: audit.k8s.io/v1 # This is required.
kind: Policy
# Don't generate audit events for all requests in RequestReceived stage.
omitStages:
- "RequestReceived"
rules:
# Log pod changes at RequestResponse level
- level: RequestResponse
resources:
- group: ""
# Resource "pods" doesn't match requests to any subresource of pods,
# which is consistent with the RBAC policy.
resources: ["pods"]
# Log "pods/log", "pods/status" at Metadata level
- level: Metadata
resources:
- group: ""
resources: ["pods/log", "pods/status"]
# Don't log requests to a configmap called "controller-leader"
- level: None
resources:
- group: ""
resources: ["configmaps"]
resourceNames: ["controller-leader"]
# Don't log watch requests by the "system:kube-proxy" on endpoints or services
- level: None
users: ["system:kube-proxy"]
verbs: ["watch"]
resources:
- group: "" # core API group
resources: ["endpoints", "services"]
# Don't log authenticated requests to certain non-resource URL paths.
- level: None
userGroups: ["system:authenticated"]
nonResourceURLs:
- "/api*" # Wildcard matching.
- "/version"
# Log the request body of configmap changes in kube-system.
- level: Request
resources:
- group: "" # core API group
resources: ["configmaps"]
# This rule only applies to resources in the "kube-system" namespace.
# The empty string "" can be used to select non-namespaced resources.
namespaces: ["kube-system"]
# Log configmap and secret changes in all other namespaces at the Metadata level.
- level: Metadata
resources:
- group: "" # core API group
resources: ["secrets", "configmaps"]
# Log all other resources in core and extensions at the Request level.
- level: Request
resources:
- group: "" # core API group
- group: "extensions" # Version of group should NOT be included.
# A catch-all rule to log all other requests at the Metadata level.
- level: Metadata
# Long-running requests like watches that fall under this rule will not
# generate an audit event in RequestReceived.
omitStages:
- "RequestReceived"
mode: "0544"
name: audit-policy-config
path: /etc/kubernetes/audit/policy-config.yaml
roles:
- ControlPlane
- content: |
apiVersion: apiserver.config.k8s.io/v1
kind: AdmissionConfiguration
plugins:
- name: EventRateLimit
configuration:
apiVersion: eventratelimit.admission.k8s.io/v1alpha1
kind: Configuration
limits:
- type: Namespace
qps: 50
burst: 100
cacheSize: 2000
- type: User
qps: 10
burst: 50
mode: "0544"
name: eventratelimit-config
path: /srv/kubernetes/kube-apiserver/admission-control.yaml
roles:
- ControlPlane
iam:
allowContainerRegistry: true
legacy: false
serviceAccountExternalPermissions: <replaced>
useServiceAccountExternalPermissions: true
karpenter: {}
kubeAPIServer:
admissionControlConfigFile: /srv/kubernetes/kube-apiserver/admission-control.yaml
auditLogMaxAge: 30
auditLogMaxBackups: 10
auditLogMaxSize: 100
auditLogPath: /var/log/kube-apiserver-audit.log
auditPolicyFile: /etc/kubernetes/audit/policy-config.yaml
enableAdmissionPlugins:
- DefaultStorageClass
- DefaultTolerationSeconds
- LimitRanger
- MutatingAdmissionWebhook
- NamespaceLifecycle
- NodeRestriction
- ResourceQuota
- RuntimeClass
- ServiceAccount
- ValidatingAdmissionPolicy
- ValidatingAdmissionWebhook
- AlwaysPullImages
- EventRateLimit
enableProfiling: false
oidcClientID: <replaced>
oidcGroupsClaim: <replaced>
oidcIssuerURL: <replaced>
oidcUsernameClaim: <replaced>
kubeControllerManager:
enableProfiling: false
terminatedPodGCThreshold: 10
kubeDNS:
externalCoreFile: |-
.:53 {
errors
health {
lameduck 5s
}
ready
kubernetes cluster.local. in-addr.arpa ip6.arpa {
pods insecure
fallthrough in-addr.arpa ip6.arpa
ttl 30
}
prometheus :9153
forward . /etc/resolv.conf {
max_concurrent 1000
}
cache 30
loop
reload
loadbalance
# Rewrites Microservices
<replaced>
}
provider: CoreDNS
kubeScheduler:
enableProfiling: false
kubelet:
anonymousAuth: false
authenticationTokenWebhook: true
authorizationMode: Webhook
eventQPS: 0
evictionHard: memory.available<1000Mi,nodefs.available<10%,nodefs.inodesFree<5%,imagefs.available<10%,imagefs.inodesFree<5%
housekeepingInterval: 1s
maxPods: 45
resolvConf: /run/systemd/resolve/resolv.conf
kubernetesApiAccess:
<replaced>
kubernetesVersion: 1.34.5
masterPublicName: <replaced>
networkCIDR: <replaced>
networking:
cilium:
enableBPFMasquerade: false
enablePrometheusMetrics: true
hubble:
enabled: true
metrics:
- httpV2:labelsContext=source_ip,source_namespace,source_workload,destination_ip,destination_namespace,destination_workload,traffic_direction;sourceContext=workload-name|reserved-identity;destinationContext=workload-name|reserved-identity
ipam: eni
nodeTerminationHandler:
enabled: false
nonMasqueradeCIDR: <replaced>
ntp:
managed: false
podIdentityWebhook:
enabled: true
serviceAccountIssuerDiscovery:
discoveryStore: <replaced>
enableAWSOIDCProvider: true
subnets:
<replaced>
topology:
dns:
type: Public
8. Please run the commands with most verbose logging by adding the -v 10 flag.
Paste the logs into this report, or in a gist and provide the gist link here.
9. Anything else do we need to know?