On some servers, I can use a local caching resolver. However, on others, I'm forced to configure the caching resolver to forward queries to an upstream public DNS server like 1.1.1.1 (via DNS-over-TLS) because of network issues. These can arise from DNS (or perhaps UDP?) rate limiting by the server hosting provider itself, at the network level.
In another instance, I noticed that the hosting provider outright drops all fragmented IP packets. I'm concerned this could lead to DNS query failures, especially if the authoritative DNS server is not available over TCP. When contacted, they told me they do this for security reasons and would not be able to disable this filter for me.
Like the OP, on these systems I often encounter the 1.1.1.1 rate limit. This is particularly the case since I have DNSSEC verification enabled, so I'm considering switching over to 8.8.8.8.
What do you mean by "fix the problem on their end"? As I explained, I am not able to fix the problem on my end, if that's what you mean. In fact, this is the exact reason why public DNS servers are so useful: they are much more reliable than the available alternatives, in many cases.
Sure, if everything was perfect and all ISP/hosting provider DNS servers were modern and well-configured and had great availability and didn't censor or hijack results, then they wouldn't be necessary. But alas, unfortunately we have to live in the real world.
I am not against rate limits, because as you said, sending a billion requests per second (due to misconfiguration, or a bug, or for malicious reasons) is not reasonable. But 10 requests/second is way, way too strict, especially if you use DNSSEC and/or have multiple client machines sharing the same IP address without a common DNS caching resolver in-between (which may not be possible to have for several reasons).
> which may not be possible to have for several reasons
Try me. What's the reason? Money can't be one of them if you expect someone else to scale up and handle your load for free as that would be very rude of you.
Sure, here are three off the top of my head that have affected me personally (not to mention the million other possible different scenarios that might exist):
1. In one case, I have multiple machines on the same network but simply speaking, none of them are turned on 24/7 (except for the router), so none of them can be configured to be a common caching resolver. The router is a proprietary Unifi gateway device, for which it is not possible to configure it to use DNS-over-TLS, neither on the local side nor when forwarding to the public DNS server.
2. The other common case for me are my mobile devices (laptops and smartphones). I simply cannot configure them to use a common local caching DNS server because there isn't one, as these devices frequently connect over multiple networks (such as 5G networks, hotel Wi-Fi, etc) and you cannot just use the network-provided DNS server without being vulnerable to man-in-the-middle attacks, censored and hijacked results, etc.
3. Another issue is public Wi-Fi networks. In large hotels, stadiums, etc, there might be hundreds or even thousands of devices behind the public same IP address(es), and as a user yourself, you have no control over them or the network.
And obviously, you cannot just use their provided DNS server on these Wi-Fi networks, for multiple reasons: 1) why would you trust the DNS server of a random Wi-Fi network in the first place?, 2) even if you trusted the provider, it is not possible to authenticate these networks, so you could be talking to a man-in-the-middle attacker without realizing, and 3) even if there isn't a man-in-the-middle attacker and the provider is trustworthy, often these DNS servers are extremely poor, as they often censor and hijack results, pollute the DNS cache, don't support DNSSEC queries, etc.
And I'm not even mentioning the privacy issues.
To be clear, I always use local caching resolvers. But on these systems, I am forced to configure them to forward queries directly to public DNS servers, which means that they don't have a common caching resolver except for the public DNS server itself, hence the rate limit issue when they are all behind the same IP address.
I don't know, the easiest solution would be "run your own DNS server that you point the weird servers at". A handful of euros per month for a VPS will get you more DNS queries than you'll ever need. No need to even hit Cloudflare when you can turn on the recursive setting on your own server; this protects you against the viable but unlikely scenario that Cloudflare messes with your DNS/gets its DNS cache poisoned/goes down.
Nothing wrong with using a public DNS resolver for your own devices, but I think using public services like these for benchmarking/load testing is abusing the generosity of the people running these servers.
It's not a money issue, as I have multiple servers already which could easily provide this service, and it's even less of a configuration issue, as I am perfectly capable of configure this if I wanted.
But these servers are in different countries, so they would be much slower than Cloudflare DNS due to network latency.
And even if I had a server in the same country for my client devices, I frequently travel to other countries anyway.
I don't think I am abusing these servers, as I only use them normally as a user, it's not like I am scraping the Internet or anything like that. Unlike the OP, I am not doing any benchmarking or load testing or anything similar.
I can't imagine what use case requires dozens of devices, hooked up to potentially untrusted random public wifi networks, making many DNS requests all at the same instant. For your very odd use case I'd suggest hosting your own recursive resolver someplace like AWS and pointing to that, but I suspect if you checked your priors you probably can change something about the first 3 conditions.
I just told you three reasons why this would happen, and I'm sure there are many more.
It's obvious that these machines can be making DNS queries simultaneously, why wouldn't they? They are completely independent of each other.
And don't forget the number of DNS queries are also magnified when the machines are configured to do DNSSEC verification.
> For your very odd use case I'd suggest hosting your own recursive resolver someplace like AWS and pointing to that, but I suspect if you checked your priors you probably can change something about the first 3 conditions.
First of all, it's not an odd use case. I think everyone has a laptop and a smartphone, and if they are not following my security practices it's either out of ignorance or out of complacency / not caring, it's not because they are doing what they should be doing. And I'm not even talking about DNSSEC here, just the basic "don't trust random Wi-Fi networks / DNS servers".
And second, why can't you imagine a hotel Wi-Fi network with hundreds or thousands of devices, all of them hooked to the same public Wi-Fi network (the hotel-provided one), configured to use a public DNS server and making requests at the same instant?
What is so hard to imagine about that?
Hell, some public Wi-Fi networks don't even have a local DNS server, they directly send the public Google DNS server IP addresses to all clients as part of their DHCP replies!
Because if the Google DNS servers were configured with a low rate limit, the DNS requests of the other clients of the hotel would cause my DNS requests to fail, as they would all appear to be coming from the same IP address from Google's perspective.
Fortunately, Google has a high DNS rate limit so this is not usually a problem.
But the whole point of this thread is that Cloudflare DNS limits requests to only 10 queries per second, which is way too low for this scenario.
> I expect you to run a dns cache/resolver at that scale and not freeload. You have plenty of customers/employees and must be making enough money.
Yeah, me too. But as a customer, I also expect DNS servers not to censor results, not to hijack results, to properly resolve DNSSEC queries, to be available over DNS-over-TLS, to resolve queries reliably and to not be down frequently.
Unfortunately these expectations are almost always broken and as a user, there's nothing I can do to change that, except to complain to the provider (which almost always does nothing useful, especially when I'm traveling) or to just use Cloudflare DNS or Google DNS myself.
I feel like I'm missing something. Aren't these requests for your servers? Why are they getting censored and hijacked? Why not use HSTS, or if impossible sign the response with public-private key pair/encrpyt requests with same on device, if you are in such hostile territory?
> I feel like I'm missing something. Aren't these requests for your servers? Why are they getting censored and hijacked?
I didn't mention I was running any servers in this conversation. Perhaps you are confusing me with the great-great...-grandparent?
The scenarios I mentioned are all from the perspective of a user.
They are getting censored because of laws in the countries I frequently visit and they are hijacked for multiple reasons (Wi-Fi portals and NXDOMAIN hijacking, mostly).
Apart from that, I also do have servers (in different countries), but that's besides the point. But note that even these individual personal servers (a couple of which are forced to use a public DNS service due to network issues) hit the Cloudflare DNS rate limit during normal operation.
As I mentioned in another thread, I don't do any benchmarking / load testing, web scraping nor anything of the sort, this is just normal operation for servers that are idle most of the time.
It's especially noticeable for reverse-IP queries, for some reason (perhaps because these requests are bursty, therefore not cached, and cause several other queries to be performed? I'm not sure). As I mentioned before, even though I use caching resolvers, I have all of my machines configured to do DNSSEC verification, which contributes to the problem.
> hit the Cloudflare DNS rate limit during normal operation.
What I find implausible is that happening to a residential user, or even a apartment building full of them, frequently enough to matter unless some bit of software somewhere is doing something profoundly silly.
I believe essentially all mainstream DNS lookup functions retry multiple times with exponential backoff, so its not just ten in a second, its ten in a second over a series of sliding windows that evaporate when they are satisfied for a time. That's a lot of requests.
> I believe essentially all mainstream DNS lookup functions retry multiple times with exponential backoff, so its not just ten in a second, its ten in a second over a series of sliding windows that evaporate when they are satisfied for a time. That's a lot of requests.
I only hit this issue when the machine has been idle for quite some time (hours, perhaps). Which indicates that part of the problem is that previous cached answers have expired.
When I hit this issue, I am pretty sure that I am doing on the order of ~10 top-level DNS queries, for about ~4 different domains, a few of them being reverse-IP queries. It's possible that these requests are being amplified due to DNSSEC, so that might be part of the reason.
It's also possible that my caching resolver, when answering a query for e.g. WWW.EXAMPLE.COM, is also doing queries for EXAMPLE.COM and .COM (and their DNSSEC keys), possibly even others, I don't know. I'm not exactly a DNS expert, unfortunately... All I know is that a dnsviz.net visualization seems to indicate that many queries are usually necessary to properly authenticate a domain that uses DNSSEC.
Perhaps another issue is that, since the queries are simultaneous, the retries for these queries are getting synchronized and therefore also happen at the same time?
I can tell you that often some of the DNS requests timeout after exactly 5 seconds, even though I have `options timeout:10 edns0` in my `/etc/resolv.conf` (which is being ignored due to a bug in bind's `host` command). Although I'm pretty sure the real problem is that I was getting a SERVFAIL response from Cloudflare DNS. If I used Google DNS instead of Cloudflare DNS the problem didn't happen.
I don't know, perhaps you are right and some software is doing something silly.
For a single home 10/sec is probably fine. But remember you are using other peoples infrastructure. Be nice. You probably want a local resolver anyway in a case where 10/s is an issue. Even for home use a local caching resolver is not a bad idea. It usually makes just day to day web surfing feel much snappier. Most home routers take care of it but are rather limited. I usually set up a local one then have my local DHCP just hand out that address so all the clients get the right thing instead of going to some external source. That way I can have a list of resolvers and one IP to hand out to my client computers.
The default of many home routers is the router for the DNS field in DHCP.
> For a single home 10/sec is probably fine. But remember you are using other peoples infrastructure. Be nice. You probably want a local resolver anyway in a case where 10/s is an issue.
In one case, I am already using a local caching resolver, which is configured to forward to Cloudflare DNS (as I have no better alternative, except Google DNS perhaps).
It's a single machine which has a public IPv4 address, not used by any other machine.
If the machine has been idle for some hours, it hits the Cloudflare DNS rate limit just by doing 5 or 6 normal queries, plus about 3 reverse IP address queries at the same time (I'd have to check for the exact numbers).
It is configured to do DNSSEC verification, which I think contributes to the issue. Perhaps the fact that these requests are bursty, therefore probably not cached, is also part of the issue.
> Even for home use a local caching resolver is not a bad idea.
All of my machines have local caching resolvers, but this does not solve the issue for many different scenarios, especially those where there are multiple machines behind the same public IP address.
> I usually set up a local one then have my local DHCP just hand out that address so all the clients get the right thing instead of going to some external source.
Sure, but as I mentioned in another thread, it's not always possible to set up a local resolver on a network, especially when you are just a client of a network that is not yours (e.g. public/hotel Wi-Fi) and whose DNS server is unreliable or untrustworthy for many different reasons.
It's a good thing that if fails in development and not just in prod. The rate-limiting itself isn't the best scenario but it is to be expected with a free product.