DNS

Intro

DNS (Domain Name System) is the internet's distributed directory: it maps human-readable names like api.example.com to machine-readable records like IP addresses. Every network connection that uses a hostname goes through DNS first. Understanding DNS is essential for diagnosing connectivity failures, designing reliable service discovery, and reasoning about propagation delays when you change infrastructure.

DNS is hierarchical and distributed — no single server knows all names. Queries are resolved by walking the hierarchy from root servers down to authoritative servers, with caching at every layer to reduce latency.

Resolution Process

When a client queries api.example.com:

sequenceDiagram
  participant Client
  participant Resolver as Recursive Resolver (ISP/8.8.8.8)
  participant Root as Root Server
  participant TLD as .com TLD Server
  participant Auth as Authoritative Server (example.com)

  Client->>Resolver: Query api.example.com
  Resolver->>Root: Who handles .com?
  Root->>Resolver: TLD server address
  Resolver->>TLD: Who handles example.com?
  TLD->>Resolver: Authoritative server address
  Resolver->>Auth: What is api.example.com?
  Auth->>Resolver: 203.0.113.42 (TTL 300)
  Resolver->>Client: 203.0.113.42 (cached)

Steps:

  1. Client checks its local cache. If a valid cached answer exists, return it.
  2. Client asks its configured recursive resolver (ISP resolver, 8.8.8.8, 1.1.1.1).
  3. Resolver checks its cache. If cached, return it.
  4. Resolver queries a root server for the TLD nameserver.
  5. Resolver queries the TLD server for the authoritative nameserver.
  6. Resolver queries the authoritative server for the record.
  7. Resolver caches the answer for the TTL duration and returns it to the client.

This full walk (iterative resolution) only happens on a cache miss. Most queries are served from the resolver's cache.

Record Types

Type Purpose Example
A IPv4 address api.example.com → 203.0.113.42
AAAA IPv6 address api.example.com → 2001:db8::1
CNAME Alias to another name www.example.com → example.com
MX Mail server example.com → mail.example.com
TXT Arbitrary text SPF, DKIM, domain verification
NS Nameserver for a zone example.com → ns1.example.com
SOA Zone authority metadata Serial, refresh, retry, expire
PTR Reverse lookup (IP → name) 42.113.0.203.in-addr.arpa → api.example.com
SRV Service location _http._tcp.example.com → host:port

TTL and Caching

Every DNS record has a TTL (Time To Live) in seconds. Resolvers and clients cache the answer for the TTL duration. After expiry, they re-query.

Implications:

DNSSEC

DNSSEC adds cryptographic signatures to DNS records, allowing resolvers to verify that responses are authentic and unmodified. It protects against DNS spoofing and cache poisoning attacks.

How it works: the authoritative server signs records with a private key. Resolvers verify signatures using the public key published in the DNS hierarchy. A chain of trust runs from the root zone down to the authoritative zone.

Adoption: DNSSEC is supported by major TLDs and cloud DNS providers (Azure DNS, Route 53, Cloudflare) but is not universally deployed. It adds complexity (key rotation, signing overhead) and does not encrypt DNS traffic — it only authenticates it.

Pitfalls

Long TTLs blocking fast failover
A 24-hour TTL means a failed server's IP stays cached for up to 24 hours. Clients will keep trying the dead IP. Fix: use short TTLs (60–300s) for records that may need fast failover, and use health-check-aware DNS (Route 53 health checks, Azure Traffic Manager).

CNAME at zone apex
A CNAME cannot coexist with other records at the zone apex (example.com). You cannot have example.com CNAME cdn.example.net alongside example.com MX mail.example.com. Fix: use ALIAS/ANAME records (supported by Route 53, Cloudflare) which behave like CNAME but are resolved server-side.

DNS cache poisoning
An attacker injects a forged DNS response into a resolver's cache, redirecting traffic to a malicious IP. Mitigations: DNSSEC, DNS-over-HTTPS (DoH), DNS-over-TLS (DoT), and source port randomization.

Split-horizon DNS
Internal and external DNS return different answers for the same name (e.g., internal IP vs public IP). Misconfiguration can cause internal services to route through the public internet or expose internal IPs externally.

Questions

Tradeoffs

TTL length: short vs long

Dimension Short TTL (60–300s) Long TTL (3600–86400s)
Failover speed Fast (minutes) Slow (hours)
Cache hit rate Low (more resolver queries) High (fewer queries)
DNS query load Higher Lower
Migration risk Low High (stale records persist)

Decision rule: use short TTLs for records that may change (load balancer IPs, CDN origins, failover targets). Use long TTLs for stable records (MX, NS, static content). Always lower TTL 24h before a planned migration, then restore it after.

Recursive vs iterative resolution
Recursive: client delegates all work to the resolver. Simpler for clients, but the resolver is a single point of failure and cache poisoning target. Iterative: client walks the hierarchy itself. Rare in practice — most clients use recursive resolvers. Useful for DNS debugging tools (dig +trace).

DNS Debugging Commands

# Full resolution trace (shows each step)
dig +trace api.example.com

# Query specific record type
dig api.example.com A
dig example.com MX

# Check TTL remaining in resolver cache
dig @8.8.8.8 api.example.com A

# Reverse lookup
dig -x 203.0.113.42

# Check DNSSEC validation
dig +dnssec api.example.com

References


Whats next