I'm a bit skeptical of TLA - while I'm not so full of myself that I believe ther...

michaelt · on Sept 28, 2022

I used it to debug the design of a bidirectional message-passing system with checksums, acknowledgements, retransmissions and message-order-maintenance guarantees.

Here is the good:

* A high level of confidence in the design - useful for systems you need to be reliable, or where a bug could create hard-to-diagnose problems.

* The satisfaction of having done a really good job for once, like the great programmers of yore who couldn't patch after release and had to get it right the first time.

* Interesting in an academic sense.

Here is the bad:

* You can still make a mistake when translating the design into code.

* The tools are sometimes baffling, both in their design decisions and their performance. Be prepared for some frustrations.

* With the tools complex and the design proven, your colleagues might not take over the TLA portion of your work and carry it forward.

hwayne · on Sept 28, 2022

There was a great talk at TLAConf this year by a PE at Nike, where they used it to find bugs in their in-store payment apps. He said it's a 40/60 chance legal will give us permission to share the recording, though.

lukeasrodgers · on Sept 28, 2022

I have used it in production at small companies (i.e. not Amazon, as some other commenters mention) for things that needed very high reliability. It helped us find a few race conditions that took > 20 steps to reproduce, that I would probably never have found otherwise, and also helped us be very confident in our fixes. I strongly recommend it if you work in scenarios where race conditions can have very bad consequences. OTOH, I also had 2-3 failed efforts to learn the language and tooling before it really stuck, and I still mostly just use the pluscal syntax.

jwolfe · on Sept 28, 2022

Maybe give this a scan, to understand the impact it had had on AWS as of 2015: https://lamport.azurewebsites.net/tla/amazon-excerpt.html

hintymad · on Sept 28, 2022

Formal verification is not new. It's been around for at least 40 years. The founders of model checking even got Turing award back in 2017. All major chip makers use formal verification to verify their circuit designs. Microsoft published papers more than 10 years ago on how they use model checking to verify Windows kernel. It was a pretty big thing when people formally verified the IEEE cache coherence protocol. Universities have been teaching formal verification and model checking for over 20 years. The list goes on. What TLA+ offers, though, is amazing usability to engineers who had no interesting studying temporal logic in depth or all kinds of mathematical logic in general. Previous generation of engineers who want to use model checking had to deal with atrocities like this: https://matthewbdwyer.github.io/psp/patterns/ctl.html#Univer.... Yeah, I'm not kidding, the simplest spec would take days if not weeks for engineers to master, if they can master them at all.

> but this extra non-debuggable step actually seems like it would be worse. Not really. Some types of bugs are just too hard to be spotted by mere mortal, or too expensive to catch in production. Case in point, do you know there's a subtle bug or at least ambiguity in the description of Paxos Made Simple? I don't know how many hundreds of people have read the paper, but I doubt if more than 100 of them spotted the bug. Similarly, Amazon hired about 20 experts in formal verification to help them catch elusive flaws in specifications. After, if S3 corrupts customer data, the consequence to the S3 team can be devastating, no?

EddySchauHai · on Sept 28, 2022

I've heard it is used at AWS. I've seen it personally used in crypto on the Cardano blockchain. Leslie Lamport gave a talk at my old workplace and mentioned several big companies in tech and finance that use it but I'm not sure if that is public info.

csb6 · on Sept 28, 2022

The FAQ [0] clarifies some of what TLA+ is good for

[0] https://learntla.com/intro/faq.html