Zero-downtime Kubernetes rolling updates with custom readiness probes

The default RollingUpdate strategy sounds safe. In practice, it silently drops requests during deploys because readiness probes almost universally lie about application state. Here's how to fix that.

Why the default strategy fails

Kubernetes marks a Pod Ready the moment its readiness probe returns HTTP 200. Most readiness probes hit /healthz or /ready — an endpoint that returns 200 as soon as the HTTP server is listening. The problem: "HTTP server is listening" is not the same as "application can serve production traffic."

Between those two states, your application may still be:

During a rolling update, kube-proxy updates iptables rules the moment a Pod transitions to Ready. Traffic floods in before the application is actually ready. You see latency spikes, 502s, and timeout errors in your APM — but only during deployments, making them hard to correlate.

The anatomy of a rolling update

Let's be precise about what happens. Given a Deployment with replicas: 4 and the default strategy:

strategy:
  type: RollingUpdate
  rollingUpdate:
    maxSurge: 1        # can temporarily have 5 pods
    maxUnavailable: 0  # at least 4 pods must be Ready at all times

The controller creates one new Pod, waits for it to become Ready, then terminates one old Pod. This continues until all replicas are replaced. maxUnavailable: 0 looks safe — but it only guarantees the count of Ready Pods, not their actual ability to handle load.

⚠ Key insight

A Pod is removed from Service endpoints only when it transitions out of Ready — not when its containers start terminating. The SIGTERM handler, preStop hook, and actual process shutdown all happen while the Pod can still receive traffic.

Writing a probe that reflects real readiness

A proper readiness probe must check the things that actually matter for serving traffic. Here's a Go example that checks database connectivity, cache warmup status, and downstream service availability:

// internal/health/readiness.go

type ReadinessChecker struct {
    db       *sql.DB
    cache    *Cache
    upstream *http.Client
    ready    atomic.Bool  // set to true after startup sequence
}

func (r *ReadinessChecker) Handler(w http.ResponseWriter, req *http.Request) {
    if !r.ready.Load() {
        http.Error(w, "startup sequence incomplete", http.StatusServiceUnavailable)
        return
    }

    ctx, cancel := context.WithTimeout(req.Context(), 500*time.Millisecond)
    defer cancel()

    // Check DB with actual query, not just ping
    var dummy int
    if err := r.db.QueryRowContext(ctx, "SELECT 1").Scan(&dummy); err != nil {
        http.Error(w, fmt.Sprintf("db: %v", err), http.StatusServiceUnavailable)
        return
    }

    // Check cache hit rate — if below threshold, still warming
    if r.cache.HitRate() < 0.6 {
        http.Error(w, "cache warming: hit rate below threshold", http.StatusServiceUnavailable)
        return
    }

    w.WriteHeader(http.StatusOK)
    fmt.Fprint(w, "ok")
}

// Call this at the end of your startup sequence
func (r *ReadinessChecker) MarkReady() {
    r.ready.Store(true)
}

Configuring the probe in the Deployment manifest

readinessProbe:
  httpGet:
    path: /ready
    port: 8080
  initialDelaySeconds: 10   # give the app time to start the HTTP server
  periodSeconds: 5           # check every 5 seconds
  successThreshold: 2        # require 2 consecutive successes before marking Ready
  failureThreshold: 3        # tolerate 3 failures before marking NotReady
  timeoutSeconds: 2          # probe must respond within 2 seconds

The critical change is successThreshold: 2. The default is 1 — a single successful probe flip transitions the Pod to Ready. Requiring two consecutive successes eliminates transient false positives from a half-initialised application.

Handling graceful termination

The other half of zero-downtime deploys: ensuring old Pods finish serving in-flight requests before dying. When Kubernetes sends SIGTERM, two things should happen:

  1. The readiness probe immediately fails (Pod removed from endpoints)
  2. The application drains active connections, then exits
// main.go — graceful shutdown

srv := &http.Server{Addr: ":8080", Handler: mux}

quit := make(chan os.Signal, 1)
signal.Notify(quit, syscall.SIGTERM, syscall.SIGINT)
go srv.ListenAndServe()

<-quit  // block until signal

// Immediately fail readiness (removed from endpoints)
checker.MarkNotReady()

// Give kube-proxy time to propagate endpoint removal (~2s)
time.Sleep(5 * time.Second)

// Then drain with timeout
ctx, cancel := context.WithTimeout(context.Background(), 30*time.Second)
defer cancel()
if err := srv.Shutdown(ctx); err != nil {
    log.Fatalf("forced shutdown: %v", err)
}

Pair this with terminationGracePeriodSeconds: 60 on the Pod spec (default is 30s — often too short for connection draining under load).

Validating the result

Generate continuous traffic with hey or k6 while triggering a rollout, and watch for non-2xx responses:

hey -z 120s -c 50 -q 100 https://your-service/api/endpoint &
kubectl rollout restart deployment/your-app
wait
# Check results — non-2xx count should be 0

With a properly implemented readiness probe and graceful shutdown, rolling deploys become genuinely invisible to clients. No more "we need to deploy at 3am" policies.

← back to all posts