Costs and Benefits of Local DNSSEC Validation
Let’s take a look at the costs and benefits of enabling DNSSEC validation on the forwarding DNS resolver in your home or business.
In our scenario there is a small to medium network that has a forwarding DNS resolver at the edge, such as on the router. The forwarding resolver receives queries from clients on the local network and forwards them to a public recursive resolver, such as Google’s Public DNS, Cloudflare DNS, OpenDNS, etc.
Benefits
DNSSEC provides cryptographic authentication of DNS records. It proves that the core of the query’s response originally came from the source of truth, the authoritative name server.
In our scenario, checking the box to “Use DNSSEC” really means performing local validation of DNSSEC data. The forwarding resolver inside your network gets the data and does the cryptography to verify the DNS records.
Local DNSSEC validation provides protection against these cache poisoning attacks:
- Your ISP injecting unsolicited DNS responses with false information, such as to send you to a search page with ads for mistyped domains
- Your public resolver unintentionally giving you bad DNS responses because its cache was poisoned
- Your public resolver intentionally lying to you, potentially because of a government request
Local DNSSEC validation mitigates these attacks by providing end-to-end authentication, at least for signed domains. But there are other solutions to some of these concerns.
#1 can be prevented by encrypting the connection to the public resolver as in DNS-over-TLS (DoT) and DNS-over-HTTPS (DoH). In #2, if the public resolver you are using already does DNSSEC validation and you trust it, then doing the validation locally is redundant. I don’t see other widely-deployed mitigations for #3 for DNS, but SSL/TLS really helps if the lookup is for HTTP traffic.
But in order for local validation to help, the domain owner has to have already set up DNSSEC on their end by “signing the domain”. So how many domains get this additional security? My estimate is that of the top 1000 DNS records, only 1.2% of them are signed. Statdns estimates 2.8% of .com
domains are signed, and Rick Lamb estimates 5% of all second level domains are signed. Enabling local DNSSEC validation does not currently help for the vast majority of domains.
There’s something ideological to be said for local DNSSEC validation — we should minimize the power over the foundations of the internet which are given to these public resolvers, most of which are run by for-profit companies. Batman once said, “Internet companies either die young or live long enough to make their users the product.” Something like that.
Cost — Performance
The biggest surprise that I found while benchmarking forwarding resolvers was the performance penalty of enabling DNSSEC validation. If you’re DNSSEC-naive like I was, you might think it’d be less than a millisecond to do some math. Unfortunately that is not correct…
I tested the most popular forwarding resolvers:
dnsmasq
, used in many consumer-grade routers, OpenWrt, and Pi-holeunbound
, used in pfSenseknot-resolver
, used by Cloudflare for their public resolver (in recursive mode)
For the local resolver to validate DNSSEC for both signed and unsigned domains, it must issue a separate DS
query to the public resolver in addition to the initial A
query. The response to the DS
query is either the DS
record that is part of DNSSEC or an NSEC/NSEC3
record that indicates the domain is not signed.
Unfortunately, all the resolvers I tested did these serially, only sending the DS
query after the A
query received a response. knot-resolver
additionally sent an NS
query when DNSSEC was enabled, and this appears to be an inefficiency.
Resolution time is determined mostly by latency to the public resolver, and sending the queries serially means that enabling DNSSEC validation doubles the resolution time, even for domains that are not signed.
The above summarizes average resolution time across 1000 unique top domains. Each forwarding resolver had a cold cache, but because these domains are so common, they should be warm in the public resolver cache. Full project details can be found here: https://github.com/cyounkins/dns-forwarder-benchmark
Wireshark confirms what the response time hints at: the queries are sent serially.
The above tests were done with a cold cache. In the real world, most records can be cached. While the NSEC/NSEC3
records I saw have TTLs of around 1 day, RFC9077 makes it very clear that the TTL should be reduced to the SOA
TTL if it is lower. What I’m seeing on the live internet matches the example in the RFC — these NSEC/NSEC3
records are cached for 15 minutes or less.
Going from ~30 milliseconds to ~60 milliseconds may not sound like a lot, and in isolation it isn’t. But today’s web applications are latency bound, and loading up a website like bbc.com
creates requests to no less than 54 separate domains. 30 milliseconds * 54 = 1.62 seconds! Because many DNS lookups are issued in parallel the impact won’t be quite that large, but I do believe it is perceptible.
Cost — Misconfiguration
Enabling DNSSEC on a domain can be tricky, and there have been many significant outages because of incorrect deployment. I personally experienced the cdc.gov
misconfiguration that persisted for many months during the height of the pandemic.
If you encounter this and still want to be able to resolve the domain, you have to disable validation, either globally or, if the software allows, per domain.
What if the public resolver validates DNSSEC?
Having the public resolver validate DNSSEC (and not doing it yourself) means concentrating trust in them. When the public resolver validates DNSSEC, no cryptographic proof is provided to you as the client, just a bit that says it happened.
As we said before, DNSSEC deployment can be tricky and if it’s not done properly, resolution of the domain will fail. These public resolvers want to strike a balance between security and ease-of-use, so they will disable validation when they deem necessary.
Cloudflare will do the same:
How Google or Cloudflare decides what is a configuration mistake and what is malice is not detailed. I’d hope they’d use their global positioning to check nameservers from multiple paths to ensure the responses are consistent and not a flood-style cache poisoning attempt.
Where do we go from here?
For me, I’m not excited about doubling the response time for most queries in exchange for the increase in security that DNSSEC provides for 2–5% of domains in addition to having to deal with misconfiguration issues.
Other than programming difficulty, I see no reason why the A
and DS
queries cannot be sent in parallel, minimizing the performance impact.
I’d like to see NSEC/NSEC3
cached for around 24 hours instead of 15 minutes, but I think there are DNS complexities that make this a bad idea in some cases.
If this helped you decide whether to check (or uncheck!) that checkbox in your router’s UI, let me know by clicking the clapping hands on the left. Cheers!