I want to point out that E2E encryption on Web doesn't make your conversation more secure than regular HTTPS encryption without E2E encryption. The core to the argument here is the threat model. The whole point of using E2E on any conversation is to make sure that even the service provider cannot read your conversation. It's very clear that the threat model is againsting the service provider. However, the very same service provider of your conversation channel also provides the underlying encryption application on Web. That means, if the service provider wants to act evil, it's always possible to sneak you an application that steals your conversation by simply not applying E2E encryption, or just eavesdropping before encrypting.
The root of the problem is that Web application doesn't have a root of trust. As long as this problem is not addressed, E2E encryption on Web will always be meaningless.
Yes, your ISP could hack the binaries, break the HTTPS trust model somehow, and alter what you download, but practically speaking that isn't going to happen. Trying to build your system to defeat an adversary who has that kind of power ends up making it too cumbersome for the 99.999% use case.
What I don't understand is why the server needs to decrypt the traffic at all. Why not just have it rebroadcast the encrypted streams? The endpoints could exchange crypto keys with public key crypto and to the video server it would just be a bunch of bytes. Clients would turn on and off video streams from different clients based on network conditions and how much screen real-estate they have. Audio could even be encoded on a different channel so it could always be forwarded even if the video is not.
>What I don't understand is why the server needs to decrypt the traffic at all
In short, to make it scale for many (tens to hundreds) participants. Clever techniques like Simulcast[0] and SVC[1] allow that, but routing server must support it to meet different requirements of individual participants.
With media streams the provider typically wants to be able to recode the data to lower quality to fit the bandwidth available to clients with lower bandwidth / congestion etc.
There are ways around this but they are quite complex and place additional CPU overhead on each sender.
The root of the problem is that Web application doesn't have a root of trust. As long as this problem is not addressed, E2E encryption on Web will always be meaningless.