Set up canary deployment with gradual rollout
✓Works with OpenClaudeYou are a DevOps engineer setting up canary deployments. The user wants to implement gradual traffic shifting to a new version with automated rollback on error metrics.
What to check first
- Kubernetes version supports
FluxcdorArgoCDfor GitOps - Service mesh installed (Istio, Linkerd, or Consul) with VirtualService/TrafficPolicy support
- Prometheus scraping metrics from your application (request latency, error rate, custom metrics)
- Current deployment manifest with resource requests/limits defined
- Load testing tool available (k6, locust, or Apache JMeter)
Steps
- Install Fluxcd with Canary CRD support:
flux bootstrap github --owner=YOUR_ORG --repo=YOUR_REPO --personal --path=clusters/my-cluster - Add Fluxcd helm-controller and notification-controller for automated promotion decisions
- Create a
Canaryresource defining traffic weights, analysis window, and success criteria thresholds - Configure Prometheus queries for error rate, latency p99, and custom business metrics
- Set up alerts to trigger rollback if metrics exceed thresholds (e.g., error_rate > 5%)
- Run smoke tests against canary endpoint before proceeding to next weight increment
- Monitor canary metrics in real-time dashboard and validate before manual or automatic promotion
- Execute progressive rollout: 10% → 25% → 50% → 100% with 5-minute analysis windows between stages
Code
# fluxcd-canary-deployment.yaml
---
apiVersion: v1
kind: Namespace
metadata:
name: canary-demo
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: app-stable
namespace: canary-demo
spec:
replicas: 3
selector:
matchLabels:
app: myapp
version: stable
template:
metadata:
labels:
app: myapp
version: stable
spec:
containers:
- name: myapp
image: myregistry.azurecr.io/myapp:v1.0.0
resources:
requests:
memory: "64Mi"
cpu: "100m"
limits:
memory: "128Mi"
cpu: "200m"
livenessProbe:
httpGet:
path: /health
port: 8080
initialDelaySeconds: 30
periodSeconds: 10
---
apiVersion: fluxcd.io/v1beta1
kind: Canary
metadata:
name: myapp-canary
namespace: canary-demo
spec:
targetRef:
apiVersion: apps/v1
kind: Deployment
name: app-stable
skipAnalysis: false
progressDeadlineSeconds: 600
service:
Note: this example was truncated in the source. See the GitHub repo for the latest full version.
Common Pitfalls
- Treating this skill as a one-shot solution — most workflows need iteration and verification
- Skipping the verification steps — you don't know it worked until you measure
- Applying this skill without understanding the underlying problem — read the related docs first
When NOT to Use This Skill
- When a simpler manual approach would take less than 10 minutes
- On critical production systems without testing in staging first
- When you don't have permission or authorization to make these changes
How to Verify It Worked
- Run the verification steps documented above
- Compare the output against your expected baseline
- Check logs for any warnings or errors — silent failures are the worst kind
Production Considerations
- Test in staging before deploying to production
- Have a rollback plan — every change should be reversible
- Monitor the affected systems for at least 24 hours after the change
Related DevOps & CI/CD Skills
Other Claude Code skills in the same category — free to download.
GitHub Actions Setup
Create GitHub Actions workflow files
GitLab CI Setup
Create .gitlab-ci.yml pipeline configuration
Jenkins Pipeline
Generate Jenkinsfile for CI/CD
Deploy Script
Create deployment scripts for various platforms
Env Manager
Manage environment variables across environments
Infrastructure as Code
Generate Terraform/Pulumi configurations
Auto Release
Set up automated releases with semantic versioning
Rollback Script
Create rollback procedures and scripts
Want a DevOps & CI/CD skill personalized to YOUR project?
This is a generic skill that works for everyone. Our AI can generate one tailored to your exact tech stack, naming conventions, folder structure, and coding patterns — with 3x more detail.