It was unlikely to be fiber or a router failing, because there's enough redundancy at all sorts of levels (usually N+2 or better). Unless, that is, some nation state had been cutting multiple fibers at once.
This had the hallmark of some system blowing up, as you said. When it comes to QoS, it gets tricky. Gmail's frontend traffic should be at the highest priority, of course. But what about the replication traffic between your mailbox homes? What if a top level layer stalls or chokes when replication lags too much behind?
It's easier for stateless or less stateful systems like web search.
https://news.ycombinator.com/item?id=20078433
It was unlikely to be fiber or a router failing, because there's enough redundancy at all sorts of levels (usually N+2 or better). Unless, that is, some nation state had been cutting multiple fibers at once.
This had the hallmark of some system blowing up, as you said. When it comes to QoS, it gets tricky. Gmail's frontend traffic should be at the highest priority, of course. But what about the replication traffic between your mailbox homes? What if a top level layer stalls or chokes when replication lags too much behind?
It's easier for stateless or less stateful systems like web search.