KubernetesAdvancedPar time: 8:00

The DNS That Only Works Sometimes

Some pods can resolve DNS. Others get NXDOMAIN. Same service, same cluster.

The Scenario

Eighteen minutes ago a PCI compliance NetworkPolicy was applied to the payments namespace. Immediately afterward, intermittent 504s started appearing on the order service. All pods show Running 1/1 Ready. No restarts. The pattern is split: pods that were running before the NetworkPolicy change resolve DNS fine because they have live TCP connections cached. Pods that scaled up afterward can't reach CoreDNS on UDP port 53 and return NXDOMAIN for internal service names.

What You'll Learn

1

How NetworkPolicy rules apply only to new connections, not established ones

2

Why DNS failures appear intermittent when only some pods are affected

3

Reading NetworkPolicy specs to identify missing DNS egress rules

4

Testing DNS resolution from inside a pod with kubectl exec and nslookup

Tools You'll Use

kubectlNetworkPolicy specnslookup in-podCoreDNS logs

Real-World Context

NetworkPolicy changes that forget to allow DNS egress are a recurring incident category. The split-brain symptom - where old pods work and new pods don't - makes the root cause extremely non-obvious.

Ready to debug this?

Free account required - sign up with GitHub or Google in 10 seconds

Play The DNS That Only Works Sometimes