They were blocking 1.1.1.1 on some firmwares long before cloudflare's dns service started. From what I've read, the routers use it on some internal interface.
It's likely incompetence, not malice. If they didn't want people using other DNS, and were willing to fuck with ip addresses they don't own to accomplish that, they'd be blackholing google's and opendns's public caching nameservers too.
It might even have been a conscious decision. Even though it's horrible and the people involved in developing the firmware need re-education. The decision probably went like this: we need an internal address to do something. We can't use 10, 172.16, or 192.168 ranges because those might conflict with internal LANs. 1.x is safe because we all know nobody uses them. The correct decision obviously would have been to get at&t corporate to commit to never using some tiny corner of their address space, and use that. Or 127.a.b.c if that works on the OS. Those options are only needed if they really need an extra IP address. They might not need one after all if they designed their firmware better.
Whenever I've needed IP ranges for similar purposes (i.e., default IPs for container or VM internal / private networks) I've used ranges from RFC 5737 (192.0.2.0/24, 198.51.100.0/24, and 203.0.213.0/24). These are for reserved for documentation purposes, so it is highly unlikely that a customer would have these going in their own internal network. Not the best solution, but better than tying up a public /24 that we own.
We used to use RFC1918 (172.16/12 IIRC) addresses for the communication between internal nodes in a cluster-in-box system that I worked on, which worked great until we had a subnet collision on a customer's network. Leaves me wondering if link-local (169.254/16, fe80::/10) would have been a better option - while technically the customer could decide to make the external (customer-facing) network have a link-local interface, the chances of that configuration actually happening are pretty slim.
I'm still not entirely sure what the best option is there. Maybe some clever use of network namespaces, with a named pipe to bridge between the "internal" and "external" universes? Just typing up that idea makes me cringe though.
BTW, for those wondering what this particular failure scenario is:
Let's use Docker's default 172.17.0.0/16 subnet as an example. So your docker host has iptable DNAT rules that routes a given "external" IP address (10.0.1.15) to a given docker container (172.17.25.92). That works great, unless you have a workstation on a subnet such as 172.17.81.0/24. When that workstation sends a packet to 10.0.1.15, that packet gets routed to the destination container 172.17.25.92. That container goes to reply, but the reply packet never makes it out to the original workstation because the container host thinks it is bound for something else on its version of the 172.17 subnet.
One workaround to this is to have the container host also put in an SNAT rule, so that anything that it forwards to a container would have the source IP address re-written to appear to come from the container host's IP, or the docker0 bridge IP (172.17.0.1/16)
On a similar note, Docker for Mac assigns (or used to?) the IP address 192.168.99.100 to the VM that runs Docker. One day I was working in a coffee shop and got really confused as to why I couldn’t connect to my application, even though the server was running. Then I realised the coffee shop WiFi was using 192.168.99.0/24 for client IPs.
I can't wait till the world comes around to the true advantages of IPv6. It's not just about adding more global addresses...nodes participate in multiple first class networks now (one of those networks is often the global internet). I'd be much more comfortable with smart devices in my home if they're on a universal local network with a public internet federation service for things like software updates. IPv6 makes this possible.
In a cluster-in-a-box scenario, you could modify the OS's network scripts to have the cluster-specific private interface start after the general LAN interface is up. Check both 10/8 and 172.16/12 to see if they're used by the public interface, and use whichever one isn't for the cluster network.
Which is the exact problem that we're seeing, here. "Oh, I know, I'll just use a segment allocated to somebody else; it's not like they use it!" Aaand...whoops, they do.
It's allocated to "DLA Systems Automation Center," a branch of the US military. The addresses are probably used on NIPRNet/SIPRNet, but not publically routed. (Much like 22.0.0.0/8.)
My personal favorites are 44.128.0.0/16, the explicitly unallocated test network for amateur packet radio to internet gateways, and 100.64.0.0/10, the address range for bidirectional carrier grade NAT.
It's likely incompetence, not malice. If they didn't want people using other DNS, and were willing to fuck with ip addresses they don't own to accomplish that, they'd be blackholing google's and opendns's public caching nameservers too.
It might even have been a conscious decision. Even though it's horrible and the people involved in developing the firmware need re-education. The decision probably went like this: we need an internal address to do something. We can't use 10, 172.16, or 192.168 ranges because those might conflict with internal LANs. 1.x is safe because we all know nobody uses them. The correct decision obviously would have been to get at&t corporate to commit to never using some tiny corner of their address space, and use that. Or 127.a.b.c if that works on the OS. Those options are only needed if they really need an extra IP address. They might not need one after all if they designed their firmware better.