KubernetesAdvancedPar time: 8:00

The DNS Cache Lie

Payment service DNS lookups are failing intermittently. The service exists and endpoints are healthy.

The Scenario

Ninety seconds ago a platform engineer ran a migration script that deleted and recreated the payment-service Kubernetes Service object. During the 2-second gap between delete and recreate, CoreDNS received several DNS queries and cached NXDOMAIN responses for payment-service.production.svc.cluster.local. Now checkout is failing on 47% of requests - exactly the requests that land on pods whose local DNS resolver is using a CoreDNS replica that cached the stale negative response. The other CoreDNS replicas have the correct A record.

What You'll Learn

How CoreDNS negative response caching (NXDOMAIN TTL) creates split-brain DNS

Why deleting and recreating a Service causes a temporary DNS poisoning window

Using kubectl exec to test DNS from individual pods and correlate to CoreDNS replicas

Flushing CoreDNS cache or using the reload plugin to recover from stale NXDOMAIN

Tools You'll Use

kubectlCoreDNS logsnslookup in-podService spec

Real-World Context

Service delete-recreate patterns - common in migration scripts and blue-green deployments - create brief DNS poisoning windows. The resulting split-brain is hard to diagnose because all services appear healthy from the outside.

Ready to debug this?

Free account required - sign up with GitHub or Google in 10 seconds

Play The DNS Cache Lie