We use aliyun and aws as cloud infrastructure. This repo includes cicd tools we've used so far.
- circleci
- codebuild - aws
- github
- gitlab.cn
- jenkins
Right now, we use aws EKS to run our apps. The general CI/CD workflow is:
- Push code to remote repo
- Test code in CI
- Docker build image based on repo's Dockerfile
- Push image to aws ECR
- kubectl apply the yaml files to roll update the apps. I.e, pull latest image from ECR and rebuild the containers in k8s.
This repo is only for a demo purpose. If you want to use this repo for your app, code from this repo should be tweaked.
You may also need to add environment variables to the selected tools for CI/CD.
Here's one of our deplyment scenarios:
- service
jiameng-api-devruns at aws EKS - service
jiameng-api-devhas two pods - aws load balancer for kubernets ingress
kubctl applyforces kubernetes pods rebuild when new changes apply to service's image in ECR
To guarantee aws zero deployment downtime, we have enabled alb pod_readiness_gate.
When alb pod_readiness_gate is not enabled, there could be no healthy targets in the target group. Targets can be in either draining or initial state, but no healthy state.
With alb pod_readiness_gate enabled, it is guarenteed that there's always at least one health target available for the target group. However, this only reduces the odds of 5xx error, but does not 100% remove error.
We write one script to test it. See aws_alb_test.sh. It uses describe-target-health to get target healthy state before sending requests to the Go app.
We also enabled alb access log to track which targets are serving our test requests.
Here're the test steps:
- run shell script
./aws_alb_test.shin one terminal to send requests to load balancer - make change to date value
date: "<DATE>"atk8s_deployment.yamland other test changes to force pods rebuild in nextkubectl applycommand - run
kubectl apply -f k8s_deployment.yamlin separate terminal to rebuild pods - observe the shell script terminal result to search 5xx errors
Ideally, we can expect zero downtime with alb pod_readiness_gate enabled. But in reality, we can still observe 5xx errors from shell script terminal.
5xx errors with healthy targets
16 starts kube pod [2023-01-07 17:22:40.926437] ;
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
jiameng-api-dev-7cf849584f-kh7vk 1/1 Terminating 0 39m 10.0.2.61 ip-10-0-2-240.cn-northwest-1.compute.internal <none> 1/1
jiameng-api-dev-7fccd97f9d-d66qt 1/1 Running 0 17s 10.0.2.134 ip-10-0-2-240.cn-northwest-1.compute.internal <none> 1/1
jiameng-api-dev-7fccd97f9d-vrk9p 1/1 Running 0 35s 10.0.1.222 ip-10-0-1-89.cn-northwest-1.compute.internal <none> 1/1
{
"TargetHealthDescriptions": [
{
"Target": {
"Id": "10.0.1.222",
"Port": 1325,
"AvailabilityZone": "cn-northwest-1a"
},
"HealthCheckPort": "1325",
"TargetHealth": {
"State": "healthy"
}
},
{
"Target": {
"Id": "10.0.2.134",
"Port": 1325,
"AvailabilityZone": "cn-northwest-1b"
},
"HealthCheckPort": "1325",
"TargetHealth": {
"State": "healthy"
}
},
{
"Target": {
"Id": "10.0.1.109",
"Port": 1325,
"AvailabilityZone": "cn-northwest-1a"
},
"HealthCheckPort": "1325",
"TargetHealth": {
"State": "draining",
"Reason": "Target.DeregistrationInProgress",
"Description": "Target deregistration is in progress"
}
},
{
"Target": {
"Id": "10.0.2.61",
"Port": 1325,
"AvailabilityZone": "cn-northwest-1b"
},
"HealthCheckPort": "1325",
"TargetHealth": {
"State": "draining",
"Reason": "Target.DeregistrationInProgress",
"Description": "Target deregistration is in progress"
}
}
]
}
16 starts curl [2023-01-07 17:22:42.554251] ;
<html>
<head><title>504 Gateway Time-out</title></head>
<body>
<center><h1>504 Gateway Time-out</h1></center>
</body>
</html>
;
16 ends curl [2023-01-07 17:22:52.765480];
Above example shows two draining targets and two healthy targets exist in the same time. It looks like one of the two draining targets is still receving traffic, otherwise we should get 200 from those other two healthy targets.
To double confirm, we find above request's load balancer access log record. Sensitive data has been replaced with xxxxxxxx.
h2 2023-01-07T09:22:52.872365Z app/k8s-apidev-95472999b0/xxxxxxxxxx 202.102.17.226:24502 10.0.2.61:1325 -1 -1 -1 504 - 49 202 "GET https://example-api.cn:443/prefix-jiameng-api/ HTTP/2.0" "curl/7.79.1" ECDHE-RSA-XXXX-GCM-SHA256 TLSv1.2 arn:aws-cn:elasticloadbalancing:cn-northwest-1:xxxxxxx:targetgroup/k8s-dev-jiamenga-edb70a94b9/081fxxxxxxxx "Root=1-63b939e2-312611fa7a64f07e63d24099" "exxample-api.cn" "arn:aws-cn:acm:cn-northwest-1:xxxxxxx:certificate/e38c0d37-5c2f-4988-81c0-xxxxxxxx" 1 2023-01-07T09:22:42.870000Z "forward" "-" "-" "10.0.2.61:1325" "-" "-" "-"
China is in UTC+8, and alb log is using UTC, so there're 8 hours difference between our shell script screen and alb access log.
Anyway, we can see load balancer has routed request to this draining target 10.0.2.61:1325. target_status_code is -, and elb_status_code is 504. It means connection between load balancer and this target 10.0.2.61 has been closed, then load balancer returns 504 to client. These fields explanation can be found at access logs. It makes sense this connection is closed since this target 10.0.2.61's pod jiameng-api-dev-7cf849584f-kh7vk is in terminating status, and this pod can exit very quickly.
According to Register targets
The load balancer stops routing requests to a target as soon as you deregister it.
Apparently, in above test cases, the load balancer keeps routing requests to the draining(deregistered) targets, even in the meantime, there're healthy targets available.
However, according to Deregistration delay
If a deregistering target terminates the connection before the deregistration delay elapses, the client receives a 500-level error response.
These 5xx errors become reasonable since our deregistering(draining) targets have closed the connection due to their pods being terminated, and deregistration delay is not elapsed yet.
Load balancer has routed traffic to draining state target, but the connection between load balancer and the draining target has been closed due to target's pod being terminated.
To fix this issue, we want to keep the draining targets always available to be used. In other words, if one target is in draining state, its associated pod should not exit. In this way, we hope the connection between load balancer and draing target will not be closed unless target's Connection idle timeout has reached.
So we just prevent the old pods from being terminated quickly. In order to achieve this goal, we add prestop hook to k8s deployment file. Check detail at /deploy/k8s_deployment.yaml
terminationGracePeriodSeconds: 50
lifecycle:
preStop:
exec:
command: ["/bin/sh", "-c", "sleep 40"]
This will prevent the old pods from being terminated quickly, and can stay available for 40 seconds.
Now we run the test script again, and this time, we don't see 5xx errors, but we can observe result like below.
Draining target without its associated pod available
35 starts kube pod [2023-01-06 19:41:34.769521] ;
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
jiameng-api-dev-559b96d846-vpvhg 1/1 Terminating 0 11m 10.0.2.235 ip-10-0-2-240.cn-northwest-1.compute.internal <none> 1/1
jiameng-api-dev-7665d8f85f-6r46s 1/1 Running 0 47s 10.0.2.66 ip-10-0-2-240.cn-northwest-1.compute.internal <none> 1/1
jiameng-api-dev-7665d8f85f-z6w8v 1/1 Running 0 65s 10.0.1.99 ip-10-0-1-89.cn-northwest-1.compute.internal <none> 1/1
{
"TargetHealthDescriptions": [
{
"Target": {
"Id": "10.0.2.66",
"Port": 1325,
"AvailabilityZone": "cn-northwest-1b"
},
"HealthCheckPort": "1325",
"TargetHealth": {
"State": "healthy"
}
},
{
"Target": {
"Id": "10.0.2.235",
"Port": 1325,
"AvailabilityZone": "cn-northwest-1b"
},
"HealthCheckPort": "1325",
"TargetHealth": {
"State": "draining",
"Reason": "Target.DeregistrationInProgress",
"Description": "Target deregistration is in progress"
}
},
{
"Target": {
"Id": "10.0.1.109",
"Port": 1325,
"AvailabilityZone": "cn-northwest-1a"
},
"HealthCheckPort": "1325",
"TargetHealth": {
"State": "draining",
"Reason": "Target.DeregistrationInProgress",
"Description": "Target deregistration is in progress"
}
},
{
"Target": {
"Id": "10.0.1.99",
"Port": 1325,
"AvailabilityZone": "cn-northwest-1a"
},
"HealthCheckPort": "1325",
"TargetHealth": {
"State": "healthy"
}
}
]
}
35 starts curl [2023-01-06 19:41:36.510725] ;
ok;
35 ends curl [2023-01-06 19:41:36.741279];
Target "10.0.1.109" is in draining target, but its pod is already terminated after 40 seconds. Even though we don't get 5xx error this time, there's still a chance that the draining target can be used for traffic from past test experience.
According to Deregistration delay,
The initial state of a deregistering target is draining. By default, the load balancer changes the state of a deregistering target to unused after 300 seconds.
So here we want to decrease target Deregistration delay time to make sure pod could live longer than its draining state target.
For test purpose, we set Deregistration delay time to be 35 seconds for target group.
Now Pod can live for 40 seconds before terminated, and its target can stay in draining state for maximum 35 seconds. We should expect to see result that one Pod is still available, but its associated target has become unused.
Pod lives without its associated target
30 starts kube pod [2023-01-06 20:11:48.745410] ;
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
jiameng-api-dev-6cd554685-l46jz 1/1 Running 0 37s 10.0.2.134 ip-10-0-2-240.cn-northwest-1.compute.internal <none> 1/1
jiameng-api-dev-6cd554685-w49dg 1/1 Running 0 54s 10.0.1.109 ip-10-0-1-89.cn-northwest-1.compute.internal <none> 1/1
jiameng-api-dev-7665d8f85f-6r46s 1/1 Terminating 0 31m 10.0.2.66 ip-10-0-2-240.cn-northwest-1.compute.internal <none> 1/1
jiameng-api-dev-7665d8f85f-z6w8v 1/1 Terminating 0 31m 10.0.1.99 ip-10-0-1-89.cn-northwest-1.compute.internal <none> 1/1
{
"TargetHealthDescriptions": [
{
"Target": {
"Id": "10.0.2.66",
"Port": 1325,
"AvailabilityZone": "cn-northwest-1b"
},
"HealthCheckPort": "1325",
"TargetHealth": {
"State": "draining",
"Reason": "Target.DeregistrationInProgress",
"Description": "Target deregistration is in progress"
}
},
{
"Target": {
"Id": "10.0.1.109",
"Port": 1325,
"AvailabilityZone": "cn-northwest-1a"
},
"HealthCheckPort": "1325",
"TargetHealth": {
"State": "healthy"
}
},
{
"Target": {
"Id": "10.0.2.134",
"Port": 1325,
"AvailabilityZone": "cn-northwest-1b"
},
"HealthCheckPort": "1325",
"TargetHealth": {
"State": "healthy"
}
}
]
}
30 starts curl [2023-01-06 20:11:50.392375] ;
ok;
30 ends curl [2023-01-06 20:11:50.606315];
We can see this pod "10.0.1.99" lives, but it doesn't have one associated target now. This is exactly what we want to see!
To sum up, we want to keep pod living longer than its associated target, i.e, for these three values terminationGracePeriodSeconds > preStop > Deregistration delay.
For real projects, we may want to increase the overall time, instead of 35 seconds for Deregistration delay. Saying, we have lengthy requests taking maximum 300 seconds, such as querying a big database, then we may want to increase the Deregistration delay to be above 300 seconds, then our in-flight requests for draining targets can have enough time to complete.
This blog has pointed out Connection idle timeout should be taken into above consideration too, that Deregistration delay > application's own timeout > Connection idle timeout.
With this order, when draining target transtions to unused state, connection between load balancer and this target is closed already, providing no new request routed to draining target to keep connection open.
However, even aws docs says The load balancer stops routing requests to a target as soon as you deregister it., in our above test case, the draining targets can still take new traffic. So theoretically, these values' order might not cover edge case.
For application's own timeout > Connection idle timeout, client receives 5xx when Connection idle timeout has reached. On the other hand, it might be confusing for client to receive 5xx, since application's own timeout has not reached yet. Client might prefer to see error when app's own application's own timeout has reached. So choose the strategy for your best.
Big thanks to aws solution architect chenxqdu for above discussion ^_^
