Why the default strategy fails
Kubernetes marks a Pod Ready the moment its readiness probe returns HTTP 200. Most readiness probes hit /healthz or /ready — an endpoint that returns 200 as soon as the HTTP server is listening. The problem: "HTTP server is listening" is not the same as "application can serve production traffic."
Between those two states, your application may still be:
- Loading large ML models or caches into memory
- Warming JVM JIT compilations (Java/Kotlin services)
- Establishing connection pools to databases and downstream services
- Waiting for service-mesh sidecar injection and mTLS handshakes
- Completing initial synchronisation from a message queue
During a rolling update, kube-proxy updates iptables rules the moment a Pod transitions to Ready. Traffic floods in before the application is actually ready. You see latency spikes, 502s, and timeout errors in your APM — but only during deployments, making them hard to correlate.
The anatomy of a rolling update
Let's be precise about what happens. Given a Deployment with replicas: 4 and the default strategy:
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1 # can temporarily have 5 pods
maxUnavailable: 0 # at least 4 pods must be Ready at all times
The controller creates one new Pod, waits for it to become Ready, then terminates one old Pod. This continues until all replicas are replaced. maxUnavailable: 0 looks safe — but it only guarantees the count of Ready Pods, not their actual ability to handle load.
A Pod is removed from Service endpoints only when it transitions out of Ready — not when its containers start terminating. The SIGTERM handler, preStop hook, and actual process shutdown all happen while the Pod can still receive traffic.
Writing a probe that reflects real readiness
A proper readiness probe must check the things that actually matter for serving traffic. Here's a Go example that checks database connectivity, cache warmup status, and downstream service availability:
// internal/health/readiness.go
type ReadinessChecker struct {
db *sql.DB
cache *Cache
upstream *http.Client
ready atomic.Bool // set to true after startup sequence
}
func (r *ReadinessChecker) Handler(w http.ResponseWriter, req *http.Request) {
if !r.ready.Load() {
http.Error(w, "startup sequence incomplete", http.StatusServiceUnavailable)
return
}
ctx, cancel := context.WithTimeout(req.Context(), 500*time.Millisecond)
defer cancel()
// Check DB with actual query, not just ping
var dummy int
if err := r.db.QueryRowContext(ctx, "SELECT 1").Scan(&dummy); err != nil {
http.Error(w, fmt.Sprintf("db: %v", err), http.StatusServiceUnavailable)
return
}
// Check cache hit rate — if below threshold, still warming
if r.cache.HitRate() < 0.6 {
http.Error(w, "cache warming: hit rate below threshold", http.StatusServiceUnavailable)
return
}
w.WriteHeader(http.StatusOK)
fmt.Fprint(w, "ok")
}
// Call this at the end of your startup sequence
func (r *ReadinessChecker) MarkReady() {
r.ready.Store(true)
}
Configuring the probe in the Deployment manifest
readinessProbe:
httpGet:
path: /ready
port: 8080
initialDelaySeconds: 10 # give the app time to start the HTTP server
periodSeconds: 5 # check every 5 seconds
successThreshold: 2 # require 2 consecutive successes before marking Ready
failureThreshold: 3 # tolerate 3 failures before marking NotReady
timeoutSeconds: 2 # probe must respond within 2 seconds
The critical change is successThreshold: 2. The default is 1 — a single successful probe flip transitions the Pod to Ready. Requiring two consecutive successes eliminates transient false positives from a half-initialised application.
Handling graceful termination
The other half of zero-downtime deploys: ensuring old Pods finish serving in-flight requests before dying. When Kubernetes sends SIGTERM, two things should happen:
- The readiness probe immediately fails (Pod removed from endpoints)
- The application drains active connections, then exits
// main.go — graceful shutdown
srv := &http.Server{Addr: ":8080", Handler: mux}
quit := make(chan os.Signal, 1)
signal.Notify(quit, syscall.SIGTERM, syscall.SIGINT)
go srv.ListenAndServe()
<-quit // block until signal
// Immediately fail readiness (removed from endpoints)
checker.MarkNotReady()
// Give kube-proxy time to propagate endpoint removal (~2s)
time.Sleep(5 * time.Second)
// Then drain with timeout
ctx, cancel := context.WithTimeout(context.Background(), 30*time.Second)
defer cancel()
if err := srv.Shutdown(ctx); err != nil {
log.Fatalf("forced shutdown: %v", err)
}
Pair this with terminationGracePeriodSeconds: 60 on the Pod spec (default is 30s — often too short for connection draining under load).
Validating the result
Generate continuous traffic with hey or k6 while triggering a rollout, and watch for non-2xx responses:
hey -z 120s -c 50 -q 100 https://your-service/api/endpoint & kubectl rollout restart deployment/your-app wait # Check results — non-2xx count should be 0
With a properly implemented readiness probe and graceful shutdown, rolling deploys become genuinely invisible to clients. No more "we need to deploy at 3am" policies.