
Some pods can resolve DNS. Others get NXDOMAIN. Same service, same cluster.
Eighteen minutes ago a PCI compliance NetworkPolicy was applied to the payments namespace. Immediately afterward, intermittent 504s started appearing on the order service. All pods show Running 1/1 Ready. No restarts. The pattern is split: pods that were running before the NetworkPolicy change resolve DNS fine because they have live TCP connections cached. Pods that scaled up afterward can't reach CoreDNS on UDP port 53 and return NXDOMAIN for internal service names.
How NetworkPolicy rules apply only to new connections, not established ones
Why DNS failures appear intermittent when only some pods are affected
Reading NetworkPolicy specs to identify missing DNS egress rules
Testing DNS resolution from inside a pod with kubectl exec and nslookup
NetworkPolicy changes that forget to allow DNS egress are a recurring incident category. The split-brain symptom - where old pods work and new pods don't - makes the root cause extremely non-obvious.
Free account required - sign up with GitHub or Google in 10 seconds
Play The DNS That Only Works Sometimes