> It will fail and be rolled back and you will lose the user's data that was wri...

lmm · on Oct 8, 2021

> The point of relational integrity is ensuring all your data is in a valid state.

Right, and doing that at the lowest level of your datastore is a fundamentally misguided approach, because it means you can't ever store data that's in an invalid state (almost by definition). So when data does become invalid, you're effectively forced to destroy it.

> And no, soft delete doesn't destroy referential integrity

Yes it does. It destroys the property you described above - that you ensure all your data is in a valid state.

> Don't respond with a 2xx until the data store acknowledges successful insertion.

So the user sees an error. That doesn't actually help much, because what can they do with that error?

> Use Debezium or similar to broadcast entity changes to a Kafka topic keyed on entity id. If you want to retain all changes (e.g., for event streaming), then use an appropriate retention strategy. If you want to retain only the latest state, use log compaction - it will retain the latest record with a given key.

Yes, now keep tugging on the thread of that thought. If you try to write code to reconstruct the state of your relational database based on those changes, you've got two copies of your logic that will inevitably get out of sync, and you'll have bugs that you only discover when you try to actually do it. (And if you only record differences between writes that actually made it into the relational database, you haven't solved the original problem of data being lost because writes are rejected). What you want to do is instead make those Kafka events the primary "source of truth" and construct the "current state of the world" based on that, i.e. event sourcing.

You don't, and can't, make use of transactions with that approach - you get (eventual) consistency because each log is ordered, which gives you the properties that you want with less of the downsides (deadlocks), but writing and reading the eventual results of that write is a fundamentally async process (which is good in the long term - it forces you think about your dataflow and avoid loops - though it might involve more work upfront). And you don't really have relationality - if you need a relation in your live dataflow, you'll generally join at the event pipeline level and make the joined thing its own stream (much like a materialized view in SQL-land). You can build secondary indices etc. but those are explicitly a secondary thing layered on top of your primary datastore.