Temporal is really neat but I think its marketed at too many use cases. After a ...

claytonjy · on May 9, 2024

Would love to hear more about the scale issues you saw. How many workflows or actions was too many? which components started breaking down, what were their failure modes?

ub-volta-toss · on May 16, 2024

See above. Its not so straightforward. You need enough headroom on each component that a negative feedback loop can start, eat resources, and have enough time and resources to calm itself before hitting some limit or degrading itself further

wojcikstefan · on May 15, 2024

Can you tell us more about your scaling issues with Temporal?

I haven't yet used it in production, but I would've expected that a system which evolved out of Uber's Cadence [0] (and which I believe is used at Uber extensively) would've scaled very well.

[0]: https://stackoverflow.com/a/61281435/1579058

ub-volta-toss · on May 16, 2024

I'm not sure how Uber does it, but it might be because they're using Cadence instead.

The Temporal team has acknowledged that Cassandra-backed Temporal hits scaling limits pretty fast.

The limitations aren't a clean "X actions/sec", they're sneakier. Because you can run X/sec for days and then the memory on the history service will spike, or any tiny slowdown in the DB will cause looping degradation. There are nasty feedback loops hidden in Temporal that turn small problems into very very large problems.

I think the core problem with Temporal is the way its sharded. This affects history service and its caches. If anything tells them to reload or restart, or that any of the nodes are unreachable, you get a retry storm on the DB.

In addition to these issues, Temporal can create feedback loops within itself. I've seen cases where it would not return to health, even with 0 workers requesting work for 10s of minutes.

We could have kept using and scaling Temporal, but it required 10-30x the resources of building something else. And it was scary to administer. You really need an entire team. You can't have somebody who isn't a dedicated engineer take on-call for it.

claytonjy · on May 17, 2024

What did you move to instead?

NoThisIsMe · on May 10, 2024

Invented their own database? They use Cassandra IIRC

ub-volta-toss · on May 16, 2024

Nope. They hit the scale limit with Cassandra and now have an in-house storage layer