
Payment service DNS lookups are failing intermittently. The service exists and endpoints are healthy.
Ninety seconds ago a platform engineer ran a migration script that deleted and recreated the payment-service Kubernetes Service object. During the 2-second gap between delete and recreate, CoreDNS received several DNS queries and cached NXDOMAIN responses for payment-service.production.svc.cluster.local. Now checkout is failing on 47% of requests - exactly the requests that land on pods whose local DNS resolver is using a CoreDNS replica that cached the stale negative response. The other CoreDNS replicas have the correct A record.
How CoreDNS negative response caching (NXDOMAIN TTL) creates split-brain DNS
Why deleting and recreating a Service causes a temporary DNS poisoning window
Using kubectl exec to test DNS from individual pods and correlate to CoreDNS replicas
Flushing CoreDNS cache or using the reload plugin to recover from stale NXDOMAIN
Service delete-recreate patterns - common in migration scripts and blue-green deployments - create brief DNS poisoning windows. The resulting split-brain is hard to diagnose because all services appear healthy from the outside.
Free account required - sign up with GitHub or Google in 10 seconds
Play The DNS Cache Lie