> DNSSEC can be trivially used with DANE to protect the entire session. The browser vendors quite consciously decided to NOT do that.
100%. The reasons why are explained in some detail here: https://educatedguesswork.org/posts/dns-security-dane/. The TL;DR is that by the time DANE was created the WebPKI already existed and was universal and so adding DANE didn't buy you anything because you still were going to have to have a WebPKI certificate more or less in perpetuity.
> This is the outcome of browser vendors not caring at all about privacy and security.
This is false. The browser vendors care a great deal about privacy and security. Source: it was my job at Mozilla to care about this, amongst other things. It may be the case that they have different priorities than you.
> You're saying that to provide service for anything over the Web, you have to publish all your DNS names in a globally distributed immutable log that will be preserved for all eternity?
Well, back when people were taking DNSSEC and DANE more seriously, there was a lot of talk of doing DNSSEC Transparency.
> And that you can't even have a purely static website anymore because you need to update the TLS cert every 7 days? This is just some crazy talk!
This is hyperbole, because nobody is forcing you to update the TLS cert every 7 days. It's true that the lifetimes are going to go down to 45 days eventually and LE offers 6 day certificates, but those are both optional and non-default.
Moreover, the same basic situation applies to DNSSEC, because your zone also needs to be signed frequently, for the same underlying reason: disabling compromised or mississued credentials.
> The TL;DR is that by the time DANE was created the WebPKI already existed and was universal and so adding DANE didn't buy you anything because you still were going to have to have a WebPKI certificate more or less in perpetuity.
Yet somehow they managed to wrangle hundreds of CAs to use the CT logs and to change the mandated set of algorithms.
> Well, back when people were taking DNSSEC and DANE more seriously, there was a lot of talk of doing DNSSEC Transparency.
And this would have been great. But it only needs to make transparent the changes in delegation (actually, only DS records) from the TLD to my zone. Not anything _within_ my zone.
And tellingly, the efforts to enable delegation in WebPKI are going nowhere. Even though X.509 is supporting it from the beginning (via name constraints, a critical extension).
> This is hyperbole, because nobody is forcing you to update the TLS cert every 7 days.
The eventual plan is to have shorter certs. 47 days will be mandated by 2029.
It also doesn't really change my point: I can't have a purely static server anymore and expect it to be accessible.
> Moreover, the same basic situation applies to DNSSEC, because your zone also needs to be signed frequently, for the same underlying reason: disabling compromised or mississued credentials.
That's incorrect. I've been using the same key (inside my HSM) since 2016. And I don't have to update the zone if it's unchanged. DNSSEC is actually _more_ secure than TLS, because zone signing can be done fully offline. With TLS, the key material is often a buggy memcpy() away from the corrosive anonymous Internet environment.
So you can rotate the DNSSEC keys, but it's neither mandated nor necessary. The need for short-lived certs for TLS is because there's no way to check their validity online during the request (OCSP is dead and CRLs are too bulky). But with DNSSEC if at any point my signing key is compromised, I can just change the DS records in the registrar to point to my updated key.
> > The TL;DR is that by the time DANE was created the WebPKI already existed and was universal and so adding DANE didn't buy you anything because you still were going to have to have a WebPKI certificate more or less in perpetuity.
>
> Yet somehow they managed to wrangle hundreds of CAs to use the CT logs and to change the mandated set of algorithms.
I'm not sure I see the connection here. What I'm saying is that the benefit for sites to adopt DANE is very low because as long as there are a lot of non-DANE-using clients out there they still need to have a WebPKI cert. This has nothing to do with CT and not much to do with the SHA-1 transition.
Re: your broader point about static sites, I don't think you're correct about the security requirements. Suppose for the sake of argument that your signing key is compromised: sure you can change the DS records but the attacker already has a valid DNSSEC record and that's sufficient to impersonate you for the lifetime of the record (recall that the Internet Threat Model is that the attacker controls the network so they can just send whatever DNS responses they want). What prevents this is that the records expire, so the duration of compromise is the duration of those records, just like with the WebPKI without revocation [0]. The same thing is true for the TLSA records signed by your ZSK.
In the DNSSEC/DANE paradigm, then, there are two signatures that have to happen regularly:
- The signature of the parent over the DS records, attesting to your ZSK.
- The signature of your ZSK over the TLSA records.
In the WebPKI paradigm, the server has to regularly contact the CA to get a new certificate. [1]
I agree with you that one advantage of DNSSEC is that that signing can all be done offline and then the data pushed up to the DNS servers, but it's still the case that something has to happen regularly. You've just pushed that off the TLS server and into the DNS infrastructure.
More generally, I'm not sure what you mean by a "purely static server". TLS servers are inherently non-static because they need to do the TLS handshake and I think the available evidence is the ACME exchange isn't that big a deal.
[0] As an aside, all the major browsers now have some compressed online revocation system, but that's not necessarily a generalizable solution.
[1] When we first were designing LE and ACME, I advocated for the CA to proactively issue new certificates over the old key, but things didn't end up that way, and of course you'd still need to download it.
> I'm not sure I see the connection here. What I'm saying is that the benefit for sites to adopt DANE is very low because as long as there are a lot of non-DANE-using clients out there they still need to have a WebPKI cert.
I find this argument laughable. Adding support in just 4 browsers and to iOS/Android would have moved something like 99% of traffic to DANE. The long tail could have been tackled incrementally. A lot of it doesn't even care about validation anyway.
> Re: your broader point about static sites, I don't think you're correct about the security requirements. Suppose for the sake of argument that your signing key is compromised: sure you can change the DS records but the attacker already has a valid DNSSEC record and that's sufficient to impersonate you for the lifetime of the record
I said it before, and let me repeat it: long TTLs for DNS records are an operational malpractice at this point, even disregarding DNSSEC. Having a TTL more than 15 minutes provides no practical advantages outside the root zone.
> I agree with you that one advantage of DNSSEC is that that signing can all be done offline and then the data pushed up to the DNS servers, but it's still the case that something has to happen regularly. You've just pushed that off the TLS server and into the DNS infrastructure.
Why? What additional security do I gain from periodic ZSK/KSK rotations? Especially if I keep the private key material offline (in an HSM), which is not possible for TLS, btw.
> In the WebPKI paradigm, the server has to regularly contact the CA to get a new certificate. [1]
Except that ACME does not enforce the private key rotation. I think most of infrastructure now rotates them, but the old key will still be valid for the duration of the compromised cert. And unlike 1-2 hour typical DNS TTLs, the WebPKI certs will be valid for weeks/days.
So yeah, I don't see any reason why WebPKI is _technically_ superior. I can see it being superior because of the browser vendors' support.
> I'd argue that the only difference is that browser vendors care about protecting against MITM on the client side. They're fine with MITM on the server side or with (potentially state-sponsored) BGP prefix hijacks. And I'm not fine with that personally.
Speaking as someone who was formerly responsible for deciding what a browser vendor cared about in this area, I don't think this is quite accurate. What browser vendors care about is that the traffic is securely conveyed to and from the server that the origin wanted it to be conveyed to. So yes, they definitely do care about active attack between the client and the server, but that's not the only thing.
To take the two examples you cite, they do care about BGP prefix hijacks. It's not generally the browser's job to do something about it directly, but in general misissuance of all stripes is one of the motivations for Certificate Transparency, and of course the BRs now require multi-perspective validation.
I'm not sure precisely what you mean by "MITM on the server side". Perhaps you're referring to CDNs which TLS terminate and then connect to the origin? If so, you're right that browser vendors aren't trying to stop this, because it's not the business of the browser how the origin organizes its infrastructure. I would note that DNSSEC does nothing to stop this either because the whole concept is the origin wants it.
> I'm not sure precisely what you mean by "MITM on the server side".
For the vast majority of Let's Encrypt certs, you only need to transiently MITM the plain HTTP traffic between the server and the rest of the net to obtain the certificate for its domain. There will be nothing wrong in the CT logs, just another routine certificate issuance.
It is possible to limit this with, yes, DNS. But then we're back to square one with DNS-based security. Without DNSSEC the attacker can just MITM the DNS traffic along with HTTP.
Google, other browser makers, and large services like Facebook don't really care about this scenario. They police their networks proactively, and it's hard to hijack them invisibly. They also have enough ops to properly push the CAA records that will likely be visible to at least one point-of-view for Let's Encrypt.
To detect the misissuance you would run something that compares the certs requested by the server with the certs actually issued and included in the log. If you don't care (and most people don't) then you don't detect it.
With DNSSEC, the public key is communicated to the top-level domain registry through out-of-band means. Presumably over a secure TLS link that can't be MITM-ed. The hash of the public key ("DS record") is, in turn, signed by the TLD's key. Which in turn is signed by the well-known root zone key.
So the adversary won't be able to fake the DNSSEC signatures, even if they control the full network path. They need to compromise your registry, at the very least.
DNS underlies domain authority and the validity of every connection to every domain name ultimately traces back to DNS records. The amount of infra needed to shore up HTTPS is huge and thus SSH and other protocols rely on trust-on-first-use (unless you manually hard-code public keys yourself - which doesn't happen). DNS offers a standard, delegable PKI that is available to all clients regardless of the transport protocol.
With DNSSEC, a host with control over a domain's DNS records could use that to issue verifiable public keys without having to contact a third party.
I ran into this while working on decentralized web technologies and building a parallel to WebPKI just wasn't feasible. Whereas we could totally feed clients DNSSEC validated certs, but it wasn't supported.
Thanks for the explanation. It seems like there are two cases here:
1. Things that use TLS and hence the WebPKI
2. Other things.
None of what you've written here applies to the TLS and WebPKI case, so I'm going to take it that you're not arguing that DNSSEC validation by clients provides a security improvement in that case.
That leaves us with the non-WebPKI cases like SSH. I think you've got a somewhat stronger case there, but not much of one, because those cases can also basically go back to the WebPKI, either directly, by using WebPKI-based certificates, or indirectly, by hosting fingerprints on a Web server.
> None of what you've written here applies to the TLS and WebPKI case, so I'm going to take it that you're not arguing that DNSSEC validation by clients provides a security improvement in that case.
It would benefit the likes of Wikileaks. You could do all the crypto in your basement with an HSM without involving anyone else.
> That leaves us with the non-WebPKI cases like SSH. I think you've got a somewhat stronger case there, but not much of one, because those cases can also basically go back to the WebPKI, either directly, by using WebPKI-based certificates, or indirectly, by hosting fingerprints on a Web server.
But do they? That requires adding support for another protocol.
I would like to live in a world where I don't have to copy/paste SSH keys from an AWS console just to have the piece-of-mind that my SSH connection hasn't been hijacked.
In practice, fleet operators run their own PKIs for SSH, so tying them to the DNSSEC PKI is a strict step backwards for SSH security.
There may be other applications where a global public PKI makes sense; presumably those applications will be characterized by the need to make frequent introductions between unrelated parties, which is distinctly not an attribute of the SSH problem.
And for everyone else that just wants to connect to an SSH session without having to setup PKI themselves? Tying that to the records used to find the domain seems like the obvious place to put that information to me!
DNSSEC lets you delegate a subtree in the namespace to a given public key. You can hardcode your DNSSEC signing key for clients too.
Don't get me started on how badly VPN PKI is handled....
Yes, modern fleetwide SSH PKIs all do this; what you're describing is table stakes and doesn't involve anybody delegating any part of their security to a global PKI run by other organizations.
The WebPKI and DNSSEC run global PKIs because they routinely introduce untrusting strangers to each other. That's precisely not the SSH problem. Anything you do to bring up a new physical (or virtual) involves installing trust anchors on it; if you're in that position already, it actually harms security to have it trust a global public PKI.
The arguments for things like SSHFP and SSH-via-DNSSEC are really telling. It's like arguing that code signing certificates should be in the DNS PKI.
No, we run a fleet with thousands of physicals and hundreds of thousands of virtuals, of course we don't hardcode keys in our SSH configuration. Like presumably every other large fleet operator, we solve this problem with an internal SSH CA.
Further, I haven't "moved on to another argument". Can you answer the question I just asked? If I have an existing internal PKI for my fleet, what security value is a trust relationship with DNSSEC adding? Please try to be specific, because I'm having trouble coming up with any value at all.
We also have thousands of devices accessible over SSH and we maintain our own PKI for this purpose as well. We also use mTLS with a private CA and chain of trust, for what it's worth.
PEM actually gets used? People depend on it? It hasn't been a market success, but if the root keys for DNSSEC ended up on Pastebin this evening, almost nobody would need to be paged, and you can't say that about PEM.
Multicast gets used (I think unwisely) in campus/datacenter scenarios. Interdomain multicast was a total failure, but interdomain multicast is more recent than DNSSEC.
Fair enough on Multicast and HIP. I'm less sure about the case for PEM.
S-HTTP was a bigger failure in absolute terms (I should know!) but it was eventually published as Experimental and the IETF never really pushed it, so I don't think you could argue it was a bigger failure overall.
There really has been a 30+ year full-court press to make DNSSEC happen, including high-effort coordination with both operators and developers. I think the only comparable effort might be IPv6. But IPv6 is succeeding (slowly), and DNSSEC seems to have finally failed.
(I hate to IETFsplain anything to you so think of this as me baiting you into correcting me.)
To really nerd out about it, it seems to me there are two metrics.
1. How much it failed (i.e., how low adoption was).
2. How much effort the IETF and others put into selling it.
From that perspective, I think DNSSEC is the clear winner. There are other IETF protocols that have less usage, but none that have had anywhere near the amount of thrust applied as DNSSEC.
It's actually not safe for clients to perform local validation because a quite significant fraction of middleboxes and the like strip out RRSIG and the like or otherwise tamper with the records in such a way that the signatures don't validate.
> People stopped caring about ulta-low latency first connect times back in the 90s.
They did? That's certainly going to be news to the people at Google, Mozilla, Cloudflare, etc. who put enormous amounts of effort into building 0-RTT into TLS 1.3 and QUIC.
I did a large data analysis of DNS caching times across the web. Hyperscalers are the only ones who care and they fix that with insanely long DNS caching.
I'm not trying to just nitpick you here, but, the message I was responding to said "People stopped caring about ulta-low latency first connect times back in the 90s.".
It seems to me that you're saying here that (1) the hyperscalers do care but (2) it's under control. I'm not necessarily arguing with (2) but as far as the hyperscalers go: (1) they drive a lot of traffic on their own (2) in many cases they care so their users don't have to.
Sorry, the point I was trying to make is that this isn't a problem operationally.
Hyperscalers go to crazy lengths because they can measure monetary losses due to milliseconds of less view time and it's much easier when they have distributed cloud infrastructure anyway. But it's not really solving a problem for their customers. At least when I worked in DNS land ... latency micro-benchmarking was something of a joke. Like, sure, you can shave off a few tens of milliseconds, but it's super expensive. If you want to reduce latency, just up your TTL times and/or enable pre-fetching.
As a blocker for DNSSEC ... people made arguments about HTTPS overhead back in the day too. DoH also introduces latency, yet people aren't worried about that being a deal killer.
> As a blocker for DNSSEC ... people made arguments about HTTPS overhead back in the day too.
They did, and then we spent an enormous amount of time to shave off a few round trip times in TLS 1.3 and QUIC. So I'm not sure this is as strong an argument as you seem to think it is.
> DoH also introduces latency, yet people aren't worried about that being a deal killer.
The engineering effort! ECC solves the theoretical concerns around latency anyway yet we have people arguing that it shouldn't be done. But if it was worth making HTTPS faster to secure HTTP, why not DNS?
You're not going to find this answer satisfying, I suspect, but there are two main reasons browsers and big sites (that's what we're talking about) didn't bother to try to make DNSSEC faster:
1. They didn't think that DNSSEC did much in terms of security. I recognize you don't agree with this, but I'm just telling you what the thinking was.
2. Because there is substantial deployment of middleboxes which break DNSSEC, DNSSEC hard-fail by default is infeasible.
As a consequence, the easiest thing to do was just ignore DNSSEC.
You'll notice that they did think that encrypting DNS requests was important, as was protecting them from the local network, and so they put effort into DoH, which also had the benefit of being something you could do quickly and unilaterally.
I'm not unaware of this and I agree that WebPKI has greatly reduced global risk. New DNS tech takes a lot longer to implement but that doesn't mean we should kill DNSSEC support like the trolls insist upon!
Why would Let's Encrypt not also be interested in safeguarding DNS, SSH, BGP, and all the others? Those middle boxes will have to get replaced someday and we could push for regulation requiring that their replacements support DNSSEC. These long-term societal investments are worth making and it would enable decentralized DNS.
I'm also concerned that none of this will happen if haters won't stop screaming, "DNSSEC doesn't do anything but ackchyually harms security!".
(@tptacek: please stay out of this comment thread)
HTTPS solved a bunch of real world threat models that were causing massive security issues. So we collectively put a bunch of engineering time into making it performant so that we could deploy it everywhere with minimal impact on UX and performance.
Somehow they cause these massive security issues without impacting the 95%+ of sites that haven't used the protocol since it became viable to adopt a decade and a half ago.
It's just a very difficult statistic to get around! Whenever you make a claim like this, you're going to have address the fact that basically ~every high-security organization on the Internet has chosen not to adopt the protocol, and there are basically zero stories about how this has bit any of them.
I run a bunch of websites personally. I have ACME-issued TLS certificates from LetsEncrypt. I monitor the Certificate Transparency logs, and have CAA records set.
What's the threat model that should worry me, where DNSSEC is the right improvement?
Huh? They really don't. It's actually kind of unfortunate that browsers don't have uniform policies about what certificates they accept, but for obvious reasons each browser wants to make their own decision.
They do have uniform policies, those policies come from the aforementioned CA/Browser Forum, which has been issuing its Baseline Requirements for over a decade.
It's not really free, though. Rather, the costs are distributed rather than centralized, but running DNSSEC and keeping it working incurs new operational costs for the domain holders, who need to manage keys and DNSSEC signing, etc. And of course there are additional marginal costs to the registrars of managing customer DNSSEC, both building automation and providing customer service when it fails.
It's of course possible that the total numbers are lower than the costs of the WebPKI -- I haven't run them -- but I don't think free is the right word.
I mean, I guess the costs are paid for by the domain name fee. But at least it doesn't have to be a charitable activity covered by non-profits. The early HTTPS certs were especially worthless and price-gouging.
> But at least it doesn't have to be a charitable activity covered by non-profits.
LE isn't primarily funded by non-profits, as you can see from the sponsor list here: https://isrg.org/sponsors/
Anyway, I think there's a reasonable case that it would be better to have the costs distributed the way DNSSEC does, but my point is just that it's not free. Rather, you're moving the costs around. Like I said, it may be cheaper in aggregate, but I think you'd need to make that case.
> LE isn't primarily funded by non-profits, as you can see from the sponsor list here: https://isrg.org/sponsors/
I mean, Mozilla got the ball rolling and it's still run on donations (even if they come from private actors).
> Like I said, it may be cheaper in aggregate, but I think you'd need to make that case.
The PKI is already there: we have 7 people who can do a multisig for new root keys. There is a signing ceremony in a secure bunker somewhere that gets live streamed. The HSMs and servers are already paid for. Cert transparency/monitoring is nice but now it's hard-coded to HTTPS instead of being done more generically. There's a lot of duplicated effort.
> > LE isn't primarily funded by non-profits, as you can see from the sponsor list here: https://isrg.org/sponsors/
>
> I mean, Mozilla got the ball rolling
Among others:
Let’s Encrypt was created through the merging of two simultaneous
efforts to build a fully automated certificate authority. In 2012, a
group led by Alex Halderman at the University of Michigan and
Peter Eckersley at EFF was developing a protocol for automatically
issuing and renewing certificates. Simultaneously, a team at Mozilla
led by Josh Aas and Eric Rescorla was working on creating a free
and automated certificate authority. The groups learned of each
other’s efforts and joined forces in May 2013.
...
Initially, ISRG was funded almost entirely through large dona-
tions from technology companies. In late 2014, it secured financial
commitments from Akamai, Cisco, EFF, and Mozilla, allowing the
organization to purchase equipment, secure hosting contracts, and
pay initial staff. Today, ISRG has more diverse funding sources; in
2018 it received 83% of its funding from corporate sponsors, 14%
from grants and major gifts, and 3% from individual giving.
Except for the period before the launch when Mozilla and EFF
were paying people's salaries, including mine, it was
never really the case that Let's Encrypt was primarily funded
by non-profits.
> and it's still run on donations (even if they come from private actors).
I agree, but I think it's important to be precise about what's
happening here, and like I said, it's never been the case
that LE was really funded by non-profits.
> > Like I said, it may be cheaper in aggregate, but I think you'd need to make that case.
>
> The PKI is already there: we have 7 people who can do a multisig for new root keys. There is a signing ceremony in a secure bunker somewhere that gets live streamed. The HSMs and servers are already paid for. Cert transparency/monitoring is nice but now it's hard-coded to HTTPS instead of being done more generically. There's a lot of duplicated effort.
I think this is a category error. The main operational cost for
DNSSEC is not really the root, which is comparatively low load,
but rather the distributed operations for every registry/registrar,
and server to register keys, sign domains, etc.
One way to think about this is that running a TLD with DNSSEC is
conceptually similar to operating a CA in that you have to take
in everyone's keys and sign them. It's true you don't need to
validate their domains, but that's not the expensive part. Operating
this machinery isn't free, especially when you have to handle
exceptional cases like people who screw up their domains and need
manual help to recover. Now, it's possible that it's a marginal
incremental cost, but I doubt it's zero. Upthread, you suggested
that people are already paying for this in their domain registrations,
but that just means that the TLD operator is going to have to absorb
the incremental cost.
That's fair! My primary gripe was about the need for non-profits to step in to begin with. Sorry if I didn't communicate that well.
However, I'm don't feel sorry for registrars or TLDs. Verisign selling HTTPS certs while running the root TLDs is a conflict of interest and I believe the perverse incentives are a big part of the reason why DNSSEC and DANE are stalled out. TLDs are a monopoly business and ICANN is quasi-commercial entity that should never have been a for-profit business.
I certainly think it is fair to ask them to pay for all this.
I actually agree with you that in an abstract architectural sense a DNSSEC-style solution for authenticating they keys for endpoints is better. The problem from my perspective is that for a number of reasons that we've explored elsewhere in this thread, there is no practical way to get there from here.
To put this more sharply: in the world as it presently is with ubiquitous WebPKI deployment, the marginal benefit of DNSSEC strikes me as quite modest, even if it were universally deployed. Worse yet, the incremental benefit to any specific actor of deploying DNSSEC is even lower, which makes it very hard to get to universal deployment.
> However, I'm don't feel sorry for registrars or TLDs. Verisign selling HTTPS certs while running the root TLDs is a conflict of interest and I believe the perverse incentives are a big part of the reason why DNSSEC and DANE are stalled out. TLDs are a monopoly business and ICANN is quasi-commercial entity that should never have been a for-profit business.
>
>I certainly think it is fair to ask them to pay for all this.
I also do not feel sorry for registrars. However, it's also not clear to me that if somehow they were forced to incur incremental cost X per domain name, they would not find a way to pass it onto us. With that said, I also don't think that's really why DNSSEC and DANE are stalled out; rather I think that it's the deployment incentives I mentioned above.
Note that despite the confusing naming and the fact that VeriSign was once a CA, they no longer are and have not been since 2010, as described in the second paragraph of their Wikipedia page. https://en.wikipedia.org/wiki/Verisign. In fact, in my experience VeriSign is very pro-DNSSEC.
From my perspective, the challenge with DNSSEC is that it just doesn't have a very good cost/benefit ratio. Once the WebPKI exists, "critical path" use of DNSSEC only offers modest value. Now, obviously, this article is about requiring CAs to check DNSSEC, which is out of the critical path and of some value, but it's not clear to me it's of enough value to get people to actually roll out DNSSEC.
Can you elaborate a bit more about what you think the unnecessary complexity here?
A basic source of concern here is whether it's safe for the server to use an initial congestion window large enough to handle the entire PQ certificate chain without having an unacceptable risk of congestion collapse or other negative consequences. This is a fairly complicated question of network dynamics and the interaction of a bunch of different potentially machines sharing the same network resources, and is largely independent of the network protocol in use (QUIC versus TCP). It's possible that IW20 (or whatever) is fine, but it may well may not be.
There are two secondary issues:
1. Whether the certificate chain is consuming an unacceptable fraction of total bandwidth. I agree that this is less likely for many network flows, but as noted above, there are some flows where it is a large fraction of the total.
2. Potential additional latency introduced by packet loss and the necessary round trip. Every additional packet increases the chance of one of them being lost and you
need the entire certificate chain.
It seems you disagree about the importance of these issues, which is an understandable position, but where you're losing me is that you seem to be attributing this to the design of the protocols we're using. Can you explain further how you think (for instance) QUIC could be different that would ameliorate these issues?
For point 1, as I noted here [1], total bandwidth and resources are dominated by large flows. Endpoints are powerful enough to handle these large flows. The primary problems would lie with poor intervening networks and setup overhead.
For point 2, that is a valid concern of any case where you have just plain old more data. This dovetails into my actual point.
The problem of going from a 4 KB certificate chain to a 16 KB certificate chain, 160 KB certificate chain, or any arbitrary sized certificate chain should be equivalent to the problem of "server sends N byte response like normal". To simplify the problem a little it is just: the client sends a R-byte request message, the server responds with the Q-byte response message (which happens to be a certificate chain), the client sends the P-byte actual request, the server responds with a K-byte response message. So, at the risk of over-simplification, the problem should only be marginally harder than any generic "time to Q + K bytes".
Of course, if you previously had a 4 KB actual response and a 4 KB certificate chain and now it is a 160 KB certificate chain, you are going from "time to 8 KB" to "time to 164 KB". That is the essential complexity to the problem. But as I noted in my response to your point 1, the amount of server and client resources actually being expended on "small" requests is small with only poor networks where you are now consuming significantly increased bandwidth being a problem.
This then leads into the question of why "time to 8 KB" versus "time to 164 KB" is viewed as such a dramatic difference. This is a artifact of poor protocol design.
From a network perspective, the things that mostly matter are end-to-end bandwidth, end-to-end latency, endpoint receive buffer size, and per-hop bandwidth/buffering. You have a transport channel with unknown, dynamic bandwidth and unknown latency and your protocol attempts to discover the true transport channel parameters. Furthermore, excessive usage degrades overall network performance, so you want to avoid over-saturating the network during your discovery. In a ideal world, you would infer the transport parameters of every hop along your path to determine your holistic end-to-end transport channel parameters. This is problematic due to paths shifting or just plain dynamic throttling, so you will probably only limit yourself to "client to common bottleneck (e.g. your router) path" and "common bottleneck to server path". The "client to common bottleneck path" is likely client controlled and can be safely divided and allocated by the client. The "common bottleneck to server path" is not efficiently controllable by the client so requires safe discovery/inference.
The "initial congestion window" is a initial bandwidth-delay product to avoid over-saturating the network. This does not directly map to the transport parameters that matter. What you actually want is a initial safe "end-to-end bandwidth" which you refine via the discovery process. The latency of your roundtrip then only matters if the endpoint receive buffer size is too small and only effects how quickly you can refine/increase the computed safe "end-to-end" bandwidth.
Under the assumption that a 16 KB "initial congestion window" is fine and we assume the default RTT is ~100 ms (a somewhat reasonable assumption for geographically distributed servers who want to minimize latency) then that is actually a initial safe "end-to-end bandwidth" assumption of (16 KB / 0.1 s * 8 B/b) = ~1.3 Mb/s. Assuming the client advertises a receive buffer large enough for the entire certificate chain (which it absolutely should) and there are no packet losses, the client would get the entire certificate chain in ~(1 s + RTT) in the worst case. Note how that has only a minor dependency on the end-to-end latency. Of course it could get the data sooner if the bandwidth gets refined to a higher number, and a lower RTT gives more opportunities to get refined to a higher number, but that bounds our worst case (assuming no packet loss) to something that is not really that bad especially for the poor network throughput that we are assuming.
This then makes it obvious how to improve this scheme by choosing better initial estimates of "end-to-end" bandwidth or actively communicating that information back and forth. The "client to common bottleneck path" can be "controlled" by the client, so it can allocate bandwidth amongst all of its connections and it can set aside bandwidth on that leg for receiving. This allows higher initial "end-to-end" bandwidth assumptions that can be safely clipped when the client realizes it is in bad network conditions such as plane wifi. If the server determines "I have set aside N b/s to the 'internet' for this client" and the client determines "I have set aside M b/s from the 'internet' for this server" then your only problem is if there is a bottleneck in the broader backbone connections between the server and client. You would almost certainly be able to support better initial bandwidth assumptions or at least faster convergence after first RTT if you communicated that information both ways. This is just a example of what and how things could be improved with fairly minimal changes.
And this all assumes that we are even trying to tackle this fairly fundamental root issue rather than what are probably heaps of other forms of accidental complexity like middleboxes just giving up if the certificates are too large or whatever else nonsense there is which is what I am pretty sure is the real impetus by why they want the networking equivalent of wanting the 737-MAX to handle the same as a 737.
100%. The reasons why are explained in some detail here: https://educatedguesswork.org/posts/dns-security-dane/. The TL;DR is that by the time DANE was created the WebPKI already existed and was universal and so adding DANE didn't buy you anything because you still were going to have to have a WebPKI certificate more or less in perpetuity.
> This is the outcome of browser vendors not caring at all about privacy and security.
This is false. The browser vendors care a great deal about privacy and security. Source: it was my job at Mozilla to care about this, amongst other things. It may be the case that they have different priorities than you.
> You're saying that to provide service for anything over the Web, you have to publish all your DNS names in a globally distributed immutable log that will be preserved for all eternity?
Well, back when people were taking DNSSEC and DANE more seriously, there was a lot of talk of doing DNSSEC Transparency.
> And that you can't even have a purely static website anymore because you need to update the TLS cert every 7 days? This is just some crazy talk!
This is hyperbole, because nobody is forcing you to update the TLS cert every 7 days. It's true that the lifetimes are going to go down to 45 days eventually and LE offers 6 day certificates, but those are both optional and non-default.
Moreover, the same basic situation applies to DNSSEC, because your zone also needs to be signed frequently, for the same underlying reason: disabling compromised or mississued credentials.
reply