Article

argo-rolloutscanary-deploymentml-deploymentmlopskubernetesistioprogressive-deliveryci-cd

Canary Deploy ML Models on Kubernetes with Argo Rollouts

Safely release new ML model versions using Argo Rollouts. Gradually shift traffic to the new model, automatically measure performance against SLOs (like error rate), and instantly roll back on failure to protect production users.

intermediate30 min4 steps

The play

Install Argo Rollouts Controller
First, you need a Kubernetes cluster with Istio installed. Then, install the Argo Rollouts controller, which introduces the `Rollout` Custom Resource Definition (CRD) that you will use to manage deployments.
Define the Rollout Resource
Instead of a standard Kubernetes `Deployment`, define an Argo `Rollout`. This manifest points to your stable and canary services and specifies the traffic shifting strategy, such as sending 10% of traffic to the new version for 5 minutes before increasing it.
Create an AnalysisTemplate
Define how to measure success. An `AnalysisTemplate` contains queries for your monitoring system (e.g., Prometheus). This example checks if the model's prediction success rate stays above 99%. If this check fails at any point, the rollout is automatically aborted and rolled back.
Trigger and Monitor the Canary Deployment
To start the canary release, update the container image in your `Rollout` manifest to the new model version and apply it. Then, use the Argo Rollouts kubectl plugin to get a real-time view of the deployment's progress, including weight shifting and analysis results.

Starter code

#!/bin/bash
# This script deploys a full Canary ML Model example using Argo Rollouts.
# Prerequisites: A Kubernetes cluster with Istio and Argo Rollouts installed.

# Create a namespace for our demo
kubectl create namespace ml-canary-demo

# Apply all resources to the cluster
cat <<EOF | kubectl apply -n ml-canary-demo -f -
apiVersion: v1
kind: Service
metadata:
  name: ml-model-stable
spec:
  ports:
  - port: 80
    targetPort: 8080
    protocol: TCP
    name: http
  selector:
    app: ml-model
---
apiVersion: v1
kind: Service
metadata:
  name: ml-model-canary
spec:
  ports:
  - port: 80
    targetPort: 8080
    protocol: TCP
    name: http
  selector:
    app: ml-model
---
apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
  name: ml-model-vsvc
spec:
  hosts:
  - "*"
  gateways:
  - mesh # Or your specific gateway
  http:
  - name: primary
    route:
    - destination:
        host: ml-model-stable
      weight: 100
    - destination:
        host: ml-model-canary
      weight: 0
---
apiVersion: argoproj.io/v1alpha1
kind: AnalysisTemplate
metadata:
  name: latency-check
spec:
  args:
  - name: virtual-service
  metrics:
  - name: p95-latency
    successCondition: result[0] <= 500 # P95 latency must be <= 500ms
    failureLimit: 2
    provider:
      prometheus:
        address: http://prometheus.istio-system.svc.cluster.local:9090
        query: |
          histogram_quantile(0.95, sum(rate(istio_request_duration_milliseconds_bucket{
            reporter="destination",
            destination_workload_namespace="ml-canary-demo"
          }[1m])) by (le))
---
apiVersion: argoproj.io/v1alpha1
kind: Rollout
metadata:
  name: ml-model-rollout
spec:
  replicas: 3
  selector:
    matchLabels:
      app: ml-model
  template:
    metadata:
      labels:
        app: ml-model
    spec:
      containers:
      - name: ml-model
        # Start with the stable version v1.0
        image: argoproj/rollouts-demo:blue
        ports:
        - containerPort: 8080
  strategy:
    canary:
      stableService: ml-model-stable
      canaryService: ml-model-canary
      trafficRouting:
        istio:
          virtualService:
            name: ml-model-vsvc
            routes:
            - primary
      steps:
      - setWeight: 25
      - pause: {duration: 2m}
      - analysis:
          templates:
          - templateName: latency-check
            args:
            - name: virtual-service
              value: ml-model-vsvc
      - setWeight: 50
      - pause: {duration: 2m}
      - analysis:
          templates:
          - templateName: latency-check
      - setWeight: 75
      - pause: {duration: 2m}
EOF

echo "✅ Base resources created in 'ml-canary-demo' namespace."
echo "👀 Monitor with: kubectl argo rollouts get rollout ml-model-rollout -n ml-canary-demo -w"
echo "🚀 To trigger the canary, run: kubectl argo rollouts set image ml-model-rollout ml-model=argoproj/rollouts-demo:yellow -n ml-canary-demo"