As a complete novice in this area, I don't understand the advantage of using a p...

gfodor · on April 7, 2020

If you have a centralized server, you have a SFU. SFUs typically expose a range of UDP (and/or TCP) ports for communication. Peer connections are allocated on a port basis. So if a user is connected to your SFU, they take up a port, and need to be able to egress over a large UDP/TCP port range to connect, since the port is assigned randomly.

However, many firewalls block port ranges, or even UDP entirely. What you really want is a way to let people speak WebRTC over a common port (443 TCP is almost never blocked.) TURN facilitates this. Sometimes it's built into SFUs, sometimes not, and requires coturn in front of it. In Slack's case (and the project I work on as well) they are running Janus, which does not have TURN built in, and hence, run coturn to facilitate TURN.

Slacks's approach is particularly interesting because they always push people through TURN, instead of allowing direct connectivity to their SFU. Hard to say why exactly, but probably it's a mix of locking down SFU onto the private network for some reasons, being able to push TURN to edge but keep SFU on private LAN, etc. Typically you don't do this I don't think, you run TURN and SFU both with public IPs, and the client connects to one or the other depending on what ICE candidates win (which is a function of their firewall rules: your browser tries to pick the 'best' candidate it can get to, ideally one over UDP without a TURN hop.)

Sean-Der · on April 7, 2020

There is no reason an SFU couldn't run everything over one port though! Then you can just use the 3-tuple to route stuff to the proper connection.

Someone is doing this right now for Pion, really excited to see it. I am especially excited to see what it means for deploys, right now asking people to expose port ranges adds so much overhead vs 1 UDP and 1 TCP for media.

0az · on April 7, 2020

Can you elaborate on what this means?

Sean-Der · on April 7, 2020

Right now most SFUs start up an ICE Agent [0] and listen to a random port. ICE is used to establish the connection between two peers. Basically both sides exchange a list of peers, and try to find the best path.

With an SFU you end up having thousands of remote peers each with their own port on your server. However you could easily listen on a single port and then handle the inbound packet depending on what the remote 3-tuple is (clients ip/port/protocol). Effectively you would just be running all your ICE Agents on one port, but doing one additional step of processing.

I need to fill out [1] more to fully explain the idea, but I think it could make a huge difference when making it easier to deploy WebRTC SFUs.

[0] https://github.com/pion/ice

[1] https://github.com/pion/webrtc/wiki/SinglePortMode

gfodor · on April 7, 2020

Yup that's a great point. I'd love to see this approach explored further. Is there any risk of tuple collisions in some bizarro NAT situation? I'd guess not, since the remote tuple needs to route to a single destination, but there's some weird stuff out there... eg one could imagine a router abusing the IP protocol to somehow route packets to different destinations despite them having the same return IP/port combo. i'm no networking wizard, but in general i assume if its possible, someone is doing it :)

e12e · on April 7, 2020

gfodor · on April 7, 2020

https://webrtcglossary.com/sfu/

annoyingnoob · on April 7, 2020

Pushing everything through a proxy does not seem ideal. Seems kind of like the easy road to adding VoIP to everywhere that slack already works.

aclavelle · on April 7, 2020

My knowledge is about 2 years old on this but I can try to explain: TURN/STUN are to facilitate users communicating behind NAT and firewalls. TURN routes all traffic through a central server and pushes it to clients which it has an established connection with, thus getting around NAT/Firewall. STUN is a bit more lightweight in that it really just helps users to negotiate a normal P2P connection and then they send messages directly to eachother.

wrkronmiller · on April 7, 2020

Thanks! That's in-line with what I thought was going on. It sounds like TURN is very close to being an open proxy.

Rather than falling back from p2p to STUN to TURN, why not replace TURN with something more application/protocol-specific?

Perhaps a webrtc-only proxy that performs authentication and can perform authorization along the lines of: user A is (only) allowed to connect to user B using protocol WebRTC.

jlokier · on April 7, 2020

A TURN server has to do much less computation, and it also doesn't need to decrypt the payload. It's more or less a fancy packet forwarder.

In addition, only a fraction of users will need TURN; the rest can use direct peer connections with the aid of NAT traversal; the two kinds of connection are more or less the same to higher layers. Conversely, if the application depended on an application server to process data, chances are you wouldn't implement a second version of the same protocol that works without the server.

So a single TURN server can handle a lot more traffic than an application server, is potentially more secure, and is more easily shared between different applications, and even different owners.

If you want it geo-distributed for latency, the ability to share the same TURN servers between different applications and owners gives you cost-latency advantages too.

ackbar03 · on April 7, 2020

> In addition, only a fraction of users will need TURN; the rest can use direct peer connections with the aid of NAT traversal;

Is there actually any data on this or is it mostly anecdotal? Because I've done my own experiments on hole punching before and its almost impossible with today's routers, almost all of which it seems implement symmetric NAT (impossible to match the ports after initial contact with the STUN sever cause it becomes assigned randomly). Compound this with the fact that some ISPs have more than 1 layer of NAT, I have trouble believing that the majority of Slack users either have a direct public IP or a convenient way to conduct NAT traversal successfully.

hatefulmoron · on April 8, 2020

> I have trouble believing that the majority of Slack users either have a direct public IP or a convenient way to conduct NAT traversal successfully.

Google's libjingle documentation[1] alludes to a statistic that says that "8% of connection attempts require an intermediary relay server".

This will obviously depend on the user demographic, I would guess that users on corporate connections are probably less likely to form successful p2p connections.

[1]: https://developers.google.com/talk/libjingle/important_conce...

mypalmike · on April 7, 2020

A custom intermediary needs to perform some expensive operations, such as decryption and re-encryption of the DTLS and SRTP going through it. It's much simpler and cheaper to just forward packets.

cjbprime · on April 7, 2020

I think many TURN servers do this, but Slack's didn't.

detaro · on April 7, 2020

Edit: nvm, confused STUN and TURN

wrkronmiller · on April 7, 2020

It sounds like the TURN server is effectively acting like an (open) proxy. Wouldn't that mean the operator still has to have the infrastructure to handle the connections + traffic?

I'm assuming, perhaps incorrectly, that most of these RTC connections are happening over NAT and therefore usually go over TURN rather than by connecting directly. Even if that's not the case, why not try direct p2p connection first then fall back to routing through an application-specific proxy, which can have tighter controls on who connects to who and what payloads they send?

detaro · on April 7, 2020

Sorry, my bad, I indeed confused STUN and TURN.