Deployment Strategies: Rolling, Blue/Green, Canary
Pyyhkäise näyttääksesi valikon
Yuki's team pushed a Friday deployment that swapped the entire production fleet at once. The new version had a memory leak that only showed up under real traffic. Within seven minutes, every instance had OOM-killed itself. The rollback took 22 minutes. Customers saw nothing but 503 errors the whole time.
This chapter is about the three deployment strategies AWS supports — and why "swap everything at once" is almost never one of them.
Why Strategy Matters
A deployment is a moment of risk. The code that worked in staging might not work in production — different traffic patterns, different data shapes, different dependencies. A good deployment strategy gives you two things:
- A blast radius limit — when the new version fails, it does not take down everything;
- A rollback mechanism — when failure is detected, you can return to the previous version quickly. The three strategies below give you both, with different trade-offs.
Rolling Deployments
A rolling deployment replaces instances one batch at a time:
-
Take 10% of the fleet out of the load balancer;
-
Update those instances to the new version;
-
Put them back in;
-
Repeat until the whole fleet runs the new version. Trade-offs:
-
Pro — no extra infrastructure needed; you reuse the existing fleet;
-
Pro — slow blast radius; failure on the first batch is detectable before the next;
-
Con — both versions run simultaneously during the rollout. Database schema changes get messy;
-
Con — rollback is also rolling — it takes the same time as the deployment. Elastic Beanstalk supports rolling out of the box. EC2 Auto Scaling Groups support it via launch template updates.
Blue/Green Deployments
A blue/green deployment runs two complete environments side by side:
-
Blue is the current live fleet;
-
Green is a parallel fleet running the new version;
-
Once green is healthy, traffic is switched from blue to green — usually by updating a DNS record or load balancer target group;
-
Blue stays running for a while, ready to take traffic back if green fails. Trade-offs:
-
Pro — instant rollback (switch traffic back to blue);
-
Pro — only one version of the code runs in production at a time after the switch;
-
Pro — green can be fully tested before any real traffic hits it;
-
Con — doubled infrastructure cost during the deployment;
-
Con — long-lived connections (WebSockets, database connections) need draining logic. AWS CodeDeploy and AWS Elastic Beanstalk both support blue/green natively. The switch can be all-at-once (instant cutover) or weighted (start with 10%, ramp to 100%).
Canary Deployments
A canary deployment sends a small percentage of traffic to the new version, monitors, then ramps up:
-
Deploy the new version alongside the old;
-
Send 5% of traffic to the new version;
-
Monitor error rates, latency, business metrics for 5–30 minutes;
-
If healthy, increase to 25%, then 50%, then 100%;
-
If unhealthy at any step, route traffic back to the old version. Trade-offs:
-
Pro — smallest blast radius of any strategy;
-
Pro — real-traffic validation with minimal exposure;
-
Pro — automated promotion based on alarms;
-
Con — both versions handle live traffic during the canary, with the same database concerns as rolling;
-
Con — requires solid monitoring; without alarms, the canary is just a delayed rolling deployment. API Gateway, AWS App Mesh, Lambda (with aliases and traffic shifting), and CodeDeploy all support canary patterns.
Lambda Has Its Own Vocabulary
For Lambda, the strategies map to alias traffic shifting:
- Linear — shift traffic in equal increments over a defined time (e.g., 10% every 10 minutes);
- Canary — shift a percentage immediately, wait, then shift the rest (e.g., 10% for 5 minutes, then 100%);
- All-at-once — shift all traffic to the new version immediately (the equivalent of the disaster Yuki had). CodeDeploy plus Lambda aliases is the standard pattern for safe Lambda rollouts.
ECS and EKS Specifics
For container workloads:
- ECS supports rolling (default) and blue/green (via CodeDeploy);
- EKS supports rolling (Kubernetes Deployment default) and more advanced patterns via tools like Argo Rollouts or Flagger;
- Fargate runs both — strategy choice is at the service or deployment level.
What Yuki's Team Did Next
After the OOM disaster, the team:
- Moved to blue/green deployment via CodeDeploy;
- Added CloudWatch alarms on error rate, p99 latency, and memory utilization;
- Set CodeDeploy to auto-rollback if any alarm fired during deployment;
- Required all production deploys to use the canary
CodeDeployDefault.Lambda10PercentEvery5Minutesconfig. The next time a deployment had a problem, traffic auto-shifted back within 5 minutes. Customers saw nothing.
For the Exam
DVA-C02 tests these patterns:
- CodeDeploy deployment configurations —
AllAtOnce,HalfAtATime,Linear,Canaryand the standard names; - Blue/Green vs Rolling differences;
- Lambda alias traffic shifting (linear, canary, all-at-once);
- CodeDeploy lifecycle hooks —
BeforeInstall,AfterInstall,ApplicationStart,ValidateService; - Automatic rollback triggers via CloudWatch alarms.
Kiitos palautteestasi!
Kysy tekoälyä
Kysy tekoälyä
Kysy mitä tahansa tai kokeile jotakin ehdotetuista kysymyksistä aloittaaksesi keskustelumme