There are few technologies as foundational to the modern internet as DNS. Actually, let me revise that. There are few technologies as foundational to networking as DNS. It forms the foundation for name resolution on basically every network big and small, even the little network that runs internally on your loopback port.
A couple weeks ago, Microsoft announced a preview feature for Windows they’re calling Zero Trust DNS, and I thought that presented a great opportunity to dive into what DNS is, the trouble with DNS from a security perspective, and what the ZTDNS technology is trying to do.
I should also say that while I think the idea behind ZTDNS is interesting, it’s also proprietary, fragile, and Windows only. It’s not going to be a silver bullet for DNS security woes across the internet, so don’t get too excited.
Start From The Beginning
To understand the need for DNS, it’s useful to trace things back to where it all began. ARPANet. In 1969, two hosts were connected at UCLA and Stanford University. The hosts at each site were connected using interface message processors, or IMPs. Yes, that’s right. The gateway device at each ARPANet site was an IMP. Nerd culture has deep roots y’all and the jokes have never been that good.
The very early implementation of IMPs were really only meant to handle a single host at each site, but the standard allowed for multiple hosts. That's crazy thought! Who could imagine having multiple hosts at a site? The luxury! The very first hosts were an SDS Sigma 7 and SDS 940 at UCLA and Stanford respectively. They were incredibly expensive machines and not exactly common across universities, let alone businesses.
ARPANet kept adding more IMPs over the next fifteen years, while also developing new technologies to deal with the challenges of multi-host, multi-hop networking, transmission control between nodes, electronic communication, and more. Standards like TCP/IP, Ethernet, SMTP, and Telnet were all created during this time period.
At the same time, new networks were created for different purposes, like MILNET, CSNET, and NSFNET (NSFWNET if you're nasty). The scale and number of nodes in these networks continued to increase.
What had not been created yet was a distributed system to translate machine names to something human readable. Up until that point, each system had a local file- usually called hosts or hosts.txt- holding a list of computer names and their network address. Early versions of ARPANet used a transport layer protocol that was later dubbed NCP, that used numeric identifiers for hosts.
As the number of hosts on ARPANet grew, the destination host address and NCP overall was replaced by TCP for transport control and IP addressing for host identification. ARPANet officially adopted TCP/IP and deprecated NCP in 1983, which is what most folks point to as the beginning of the modern internet.
From this point on, the number of nodes in your local network and the inter-network, aka internet, absolutely exploded. Computer systems are only too happy to use numbers to identify hosts, but humans aren’t so great with remembering a few hundred host name/number pairs.
The simple solution continued to be a hosts file on your system that had the identifier to hostname translation of all the other hosts on the network. That worked when there were relatively few hosts and new hosts were being added a few times a year. Someone would update the hosts file and pass it around to everyone else.
By June of 1983, the successor to ARPANet, CSNET had more than 70 sites connected, each with more than one host at the site. The passing around of a static hosts file became untenable, and so we had the establishment of the domain name system in RFC 882 in November of 1983.
DNS Is Created
RFC 882 established a hierarchical structure of domains starting with the root domain and branching outward. A fully qualified hostname would include the name of the host then a dot, subdomains separated by dots, the top level domain, and a final dot at the end, e.g. pod.chaos.lever.com.
Yes friends, technically the domain portion of all URLs should end with a period, but we decided not to do that because it was confusing and looked stupid. Hooray for us! Humans win this round.
Looking at the URL for this podcast (pod.chaoslever.com): pod is the host, chaoslever is a subdomain, com is the top-level domain. And the implied dot after com is the root domain, which corresponds to root domain servers, but we aren’t there yet.
The second big decision by RFC 882 was to make DNS distributed in nature. There was not going to be a single authoritative database of all DNS entries. Honestly, this decision was absolutely the right one, and also led to endless headaches as we will see shortly.
What’s interesting is that even back in 1983, the writers of the RFC realized that DNS could be used for more than just name resolution. To quote:
“The costs of implementing such a facility dictate that it be generally useful, and not restricted to a single application. We should be able to use names to retrieve host addresses, mailbox data, and other as yet undetermined information.”
Another pressing issue of the time was email flow. Even though SMTP was introduced in 1982, the various email hosting applications didn’t have a standardized way to find and deliver mail to the recipient. DNS was meant to solve that with special record types and a standard ending for email recipients in the form of recipient @ address.
DNS could also be used for all kinds of other things, and it is today. We’ll see how that flexibility can be abused to encapsulate commands in DNS in a later post.
The RFC identifies three primary components:
- Domain Name Space- a hierarchical tree and leaf structure of hosts and domains
- Name Servers - servers that hold a portion of the domain name space called a zone
- Resolvers - programs that know at least one name server and can submit queries on behalf of other applications
Name servers are responsible for a given zone are said to be authoritative. Their authority can be derived from name servers higher up in the hierarchy who will have a record pointing to the authoritative name servers for subdomains zones. If you've managed an Active Directory forest and seen DNS complaining about a missing subdomain delegation record, this is why.
When you try to go to pod.chaoslever.com, your local resolver attempts to resolve that name to an IP address. If it has that value cached locally, then it will use that value until it’s time-to-live has expired. If it doesn’t have the value cached locally, then it will send the DNS query to one of the configured name servers. That name server, in turn, will respond with the answer if it has it, or respond back with a different server to try.
The query logic is recursive in nature, so a resolver will first try get the name servers for com, then ask those servers for the name server for chaoslever, and finally ask the chaoslever name servers for the pod record. Some of those records may be cached at varying levels of the hierarchy, which is why DNS records can take time to update. Check out more about TTLs if you're interested.
(Okay so technically there are two approaches to DNS name resolution: recursive and iterative. Recursive name servers will try to resolve the name for the client by contacting other servers. Iterative name servers will return the next name server in the hierarchy back to the client, and its up to the client to do the recursion.)
The RFC also has some core assumptions that are just adorable, and understandable considering it was 1983 and no one had a PC, let alone a supercomputer in their pocket, and an average of 17 devices in their house that need an IP address.
A few fun assumptions:
- The size of the total database will initially be proportional to the number of hosts using the system (they thought mailboxes would change that!)
- Most of the data in the system will change very slowly (rapid is defined as once a minute, k8s looks on and laughs manically)
- Clients of the domain system should be able to identify trusted name servers they prefer to use (maybe that should be in the spec!)
Despite a few oversights and faulty assumptions, the RFC is immensely well written and prescient in some key ways. For instance, they knew that the database would be distributed, and correctly identified using iteration or recursion to answer queries.
The RFC also identified resource record types we still use today. Resource records are individual entries in a resource set for a zone. For instance, when I lookup pod.chaoslever.com, I get back a resource record of type A, class IN, with a name and address. Type A is for host addresses. Class IN is short for the ARPA Internet system, and the contents of the response depend largely on the type. I get back an address because my query was looking for a type A record. Other types include CNAMES, MF and MD records, and SOA or start of authority.
All this was created back in 1983 and is virtually unchanged.
Since DNS is hierarchical, the authority for any given name server is derived by the parent domain name server. So if I want to be authoritative for the subdomain pickles.cucumbers.com, then that authority rests with the name servers for cucumbers.com. The name servers for cucumbers.com refer to name servers for com which refer to the root DNS servers for the internet. There’s supposed to be a train of trust. But that chain can be subverted, because NONE of it is cryptographically signed.
If my client is pointed at a name server that I (in theory) trust, that name server can lie to me and claim to be authoritative for the entirety of the tree from the root label all the way down to pickles.cucumbers.com, and I will believe them. If you’ve ever accidentally broken DNS in an Active Directory forest, you know what I’m talking about.
(Why do I keep mentioning Active Directory? Because most of us will have no experience managing name servers for the internet, but many of us have dealt with internal DNS servers. Active Directory is absolutely dependent on DNS working properly for just about everything. So, if you've worked on AD, you'd dealt with DNS in some capacity.)
To add insult to injury, even if I am using the correct name server for a given domain, the requests are sent using UDP over port 53. No encryption and no signing. No session even! It would be trivial for an attacker to intercept responses and give me false resource records.
And the requests are sent in clear text, which means anyone on the wire can see the contents of every DNS query I send to the name server. This is a system ripe for abuse. A potential that is kind of acknowledged in 1983, but the magnitude of the problem was not really understood. What with there being about 70 sites on the global internet, you quite literally knew everyone involved. If Dave over at CalTech is messing up your DNS resolution, you can just call tell him to knock it off.
Then the internet exploded and suddenly DNS went from a convenience to an absolute necessity. But nothing had changed about the standard aside from introducing some new resource record types and advice on how to structure your DNS implementations.
There are three main concerns that needed to be addressed:
- Data integrity - how do I know the response I get is genuine and untainted?
- Authentication - how can I trust the server I get a response from?
- Privacy - how can I secure the responses from interlocutors?
Whatever device you’re currently using to listen to read this post likely has a network connection (unless you printed this out on paper, you weirdo) and the DNS servers it is using were probably offered up through DHCP, aka dynamic host configuration protocol. This makes the bold assumption that whatever network you’ve obtained an IP address from is giving you DNS servers you can trust.
There are some benefits to this arrangement, especially in a corporate environment. Most organizations will have an internal DNS service that can resolve internal only domains, like megacorp.local. When you need to connect to the file server accounting.megacorp.local, the DNS servers in your corporate network handle the name resolution for the megacorp.local zone. For zones outside of megacorp.local, the internal DNS servers can query external DNS servers to resolve requests for them.
Your home WiFi probably uses your wireless router for DNS services, which in all likelihood doesn’t actually host any zones, but simply acts as a proxy for your ISP’s DNS servers, which it receives through DHCP as well.
When you’re out and about and connect to free internet? You’re now at the mercy of whatever janky DNS server that free WiFi is using. If someone is feeling malicious, not only can they spy on all of your DNS requests, they can also lie to you by answering DNS queries for which they are not authoritative. That’s… not great.
DNSSEC
How can the DNS client on your machine trust the responses it gets back from a name server are genuine? One proposal is DNSSEC. RFC 2065 from back in 1997 was the beginning of adding DNS security extensions to the standard. It acknowledged that “The Domain Name System (DNS) has become a critical operational part of the Internet infrastructure yet it has no strong security mechanisms to assure data integrity or authentication.”
What the RFC proposed was to use cryptographic keys to generate a new resource record type called SIG, that would accompany a regular resource record to show that it had been signed by a trusted zone. The private key would be used to sign resource records, and a public key would be available for each zone that clients could use for verification.
How does your client get the public key for a given zone? Through DNS of course! The idea was that your client would have preloaded public keys for certain top level domains, and it could query the authoritative servers for those domains to get the public key for subdomains. The hierarchical nature of DNS makes this type of chain of trust possible.
It’s similar in nature to how your operating system and browser have a list of trusted certificate authorities, including their public keys. When you need to verify the authenticity of a certificate, you need to have the public key of the signing CA, which in turn is signed by an intermediate or root CA. By building a chain of trust from the trusted CA, your browser can decide on how much to trust certificate. It's also supposed to check CRLs, but no one really does that.
The proposal came out in 1997, and do you want to guess how widely it’s been adopted? Go ahead, guess. Here’s a map available on apnic’s website that tracks DNSSEC validation rates by country, and the US stands at 34%. Incidentally, Iceland is at 95%, which tells you how seriously Iceland takes naming. Don't believe me? Just ask Björk.
The most recent RFC for DNSSEC published in 2023 estimates DNSSEC adoption at 10% or less for website domain names. They rightly point out that although DNSSEC is still considered a best practice, in reality most DNS implementations have decided that the juice isn’t worth the squeeze. Why? Probably because of the prevalence of HTTPS.
HTTPS and the certificate system underlying it makes DNSSEC less useful. Because so many websites are secured using a TLS certificate, even if you get an invalid response for a DNS query, the certificate for a bogus website should (in theory) fail validation. In a way, HTTPS supplanted the need for DNSSEC, at least kinda sorta, it’s good enough, leave me alone and let me buy these damn pickles, I’m hungry.
Encrypting DNS
The original draft of the DNSSEC RFC explicitly called out that DNSSEC was not meant to protect the communication channel used for DNS queries. And for the longest time that channel was not secured. Traditionally DNS has used two protocols, UDP and TCP over port 53. That’s why AWS’ DNS service is called Route 53. Cute huh?
Neither UDP or TCP provide encryption of the data packets, that is up to some other layer in the stack. IPSec is one such option and was called out by the DNSSEC RFC as a possible solution. But as with the DNSSEC implementation issue, enabling encryption of the data needs to be worthwhile for the DNS resolvers out there. A standard needs to be agreed upon and implemented by the major players, or we’ll make no progress.
What finally pushed the problem of cleartext DNS communication over the tipping point was probably the incredible amount of abuse being perpetrated by ISPs. You know, the companies that people universally despise? According to Yahoo Finance, Comcast is #7 of the 20 most hated companies, beating out such stinkers as FTX and Equifax. Woo-hoo! Hometown heroes y'all.
Speaking of Yahoo, you know who bought Yahoo in the 2017? Verizon, an ISP. Why did Verizon buy Yahoo? In a word? Customer data. ISPs are not content to simply provide fast internet access- although they don’t do that very well either- they have bigger dreams. One way they can realize those bigger dreams is by selling your data to the highest bidder and then to everyone else who has a few spare coins jangling around.
And since ISPs can see ALL of your traffic traversing their network, they can inspect every DNS query to see what sites you frequent. Even if you aren’t using the ISP provided DNS servers- which we very much recommend you do not- because it’s in cleartext, they can still spy on you. Erm, I mean, optimize your web browsing experience by providing targeted advertisements. Do you remember when Verizon tried to hijack everyone's mis-typed website URLs? Peppridge Farm remembers, and they're fucking furious. (Yes, I misspelled that on purpose, it's a very funny joke, ha ha ha.)
Sometime in the mid-2010s, we sorta became aware of this and found we didn’t like it one bit. Companies like Google and Cloudflare took notice and wanted to make things better. Better for whom? That’s debatable and we’ll come back to it.
While HTTPS/TLS kinda did an end-run around DNSSEC, there was no such option for encrypting DNS queries. Enter two competing standards DoH and DoT, it’s like VHS and Beta all over again! And if you don’t get that reference, maybe BluRay vs HDDVD? No? Hmmm… USB-C vs Lightning port? Nevermind.
DNS over TLS, aka DoT was first proposed in RFC 7858 in May of 2016. The core idea was to use TLS based encryption to prevent eavesdropping and tampering of DNS queries and responses. There are a few caveats with this implementation that I think are worth bringing up.
First, traditional DNS uses UDP by default, only falling back on TCP if necessary. UDP is stateless in nature, which makes it super fast for quick communications like DNS queries. TCP requires a three-way handshake and session management. The addition of TLS over TCP adds even more overhead to the session as well. DNS over TLS is going to be somewhat slower than UDP, and require more resources on the client and server. For a DNS server handling a large number of clients the additional overhead will be noticeable.
The second big thing? The standard port for DNS over TLS is port 853. Traditional DNS uses port 53 as we’ve discussed. Port 853 is not a well-known port, more servers aren’t listening on it, and most firewalls don’t allow it. Adding a new well-known port will require a lot more work than just upgrading some DNS servers. You've got to get the security team involved, and they famously love opening firewall ports.
The other solution is DNS over HTTPS. It uses… um, HTTPS for DNS queries. Super confusing I know!
This one was proposed in 2018 by Mozilla and ICANN in RFC 8484. There’s not a lot to say here, other than it works exactly like you would expect. The client will establish a TLS session over TCP with the DNS server, and then send an HTTP GET request with the query, and the DNS server will respond with the necessary information. At first glance, it seems like DoH is adding the overhead of HTTP for no real benefit. For instance, are you really expecting low level clients to implement an HTTP stack just to do DNS queries?
The primary benefit of DoH over DoT is the use of well-known port 443 for communications. While it does require servers to be upgraded to support DoH, it doesn’t require anything else in the path to change. DoH is ideal for internet-bound queries and traffic, since browsers already have an HTTP stack and web servers are already listening on port 443.
The need to upgrade and possibly kit our new name servers is not an insignificant outlay of capital and time. So old school DNS providers (like your ISP) probably won't bother to roll out DoH, at least not at first. But you can bet your sweet bippy Google will happily harvest all that delicious DNS data instead for a comparatively small outlay.
The introduction of Quic, which uses UDP instead of TCP, adds a new form of DoH called DoH3, or DNS over HTTP/3. The main thing here is the use of UDP to carry a TLS session, which should lower latency and response times. We don’t have time to get into Quic here and honestly that could be its own post and episode, but suffice to say that DoH3 gets closer to traditional DNS over UDP response times.
For internal networks and private DNS, snooping might not be as much of an issue, so using traditional DNS could be fine. Or you can implement DoT or DoH internally. Windows Server 2022 supports both, which means your Active Directory can keep working as expected.
If you’re a home user and want to start using DNS over HTTPS, then we have some good news! You're probably using it depending on your browser. Mozilla was heavily involved in establishing the standard, and Firefox started making DoH the default in 2019 and as far as I can tell, Chrome did the same.
I should mention that just because you’re using DoH, that doesn’t mean no one is harvesting your DNS query data. Your ISP might not be snooping in anymore, but you can bet Google sure as hell is (assuming you’re using their DNS servers for resolution.) There are some privacy focused DNS providers out there, like Cloudflare, that allegedly do not record your queries. You can manually configure your browser to use those servers.
Which finally brings us to Microsoft’s zero trust DNS concept. And I have to admit, this post is already way too long. So guess what? This is going to be a two-parter!!! Next week we’ll look at Microsoft’s history with DNS, their failed attempt at a DNS alternative (ironically named WINS), and what ZTDNS actually is.