> But they don't think about what happens when two people try to make changes in...

lmm · on Oct 7, 2021

> That is in case a transaction try to modify something that was already changed in a parallel transaction that was already committed, it will fail and will be rolled back (given proper transaction levels).

It will fail and be rolled back and you will lose the user's data that was written in it, unless you effectively build your own datastore on top of what the database offers.

> Also, you can hold a transaction open for a client..

For a client that's potentially gone offline for an indefinite period? Certainly not. Even if you're heartbeating that the client still has the form open somehow, that seems like a very suspect architecture.

EdwardDiego · on Oct 7, 2021

> It will fail and be rolled back and you will lose the user's data that was written in it,

The point of relational integrity is ensuring all your data is in a valid state. If you don't want to lose a client's data because you deleted a city, there are multiple engineering approaches open to you.

1) Don't delete cities, just don't. The semantics of doing so don't make any sense in the first instance.

2) Further, deleting entities from an RDBMS is pretty damn final. What if a business analyst wanted to do some historical analysis? Well, that city is gone now, so it won't be there. This is where "soft delete" comes in (And no, soft delete doesn't destroy referential integrity) - you've marked the entity as no longer valid for use, but you still retain your data surrounding it.

3) Don't respond with a 2xx until the data store acknowledges successful insertion.

4) Use Debezium or similar to broadcast entity changes to a Kafka topic keyed on entity id. If you want to retain all changes (e.g., for event streaming), then use an appropriate retention strategy. If you want to retain only the latest state, use log compaction - it will retain the latest record with a given key.

This makes it easy for an app being spun up to obtain the entity state without touching the DB, and by continuing to subscribe to the topic, update its cache to match the source of truth.

Of course, you don't have to use Kafka, it's just one approach, but many other systems will allow you to do the same.

Basically, your objections say nothing about the usefulness of a relational datastore, but rather, the need for careful engineering when building overly distributed systems.

And through all this, I'm wondering how you imagine a non-relational datastore solving these problems any better.

lmm · on Oct 8, 2021

> The point of relational integrity is ensuring all your data is in a valid state.

Right, and doing that at the lowest level of your datastore is a fundamentally misguided approach, because it means you can't ever store data that's in an invalid state (almost by definition). So when data does become invalid, you're effectively forced to destroy it.

> And no, soft delete doesn't destroy referential integrity

Yes it does. It destroys the property you described above - that you ensure all your data is in a valid state.

> Don't respond with a 2xx until the data store acknowledges successful insertion.

So the user sees an error. That doesn't actually help much, because what can they do with that error?

> Use Debezium or similar to broadcast entity changes to a Kafka topic keyed on entity id. If you want to retain all changes (e.g., for event streaming), then use an appropriate retention strategy. If you want to retain only the latest state, use log compaction - it will retain the latest record with a given key.

Yes, now keep tugging on the thread of that thought. If you try to write code to reconstruct the state of your relational database based on those changes, you've got two copies of your logic that will inevitably get out of sync, and you'll have bugs that you only discover when you try to actually do it. (And if you only record differences between writes that actually made it into the relational database, you haven't solved the original problem of data being lost because writes are rejected). What you want to do is instead make those Kafka events the primary "source of truth" and construct the "current state of the world" based on that, i.e. event sourcing.

You don't, and can't, make use of transactions with that approach - you get (eventual) consistency because each log is ordered, which gives you the properties that you want with less of the downsides (deadlocks), but writing and reading the eventual results of that write is a fundamentally async process (which is good in the long term - it forces you think about your dataflow and avoid loops - though it might involve more work upfront). And you don't really have relationality - if you need a relation in your live dataflow, you'll generally join at the event pipeline level and make the joined thing its own stream (much like a materialized view in SQL-land). You can build secondary indices etc. but those are explicitly a secondary thing layered on top of your primary datastore.

kaba0 · on Oct 7, 2021

What does the form have to do with anything? That’s just part of an application, nothing special. What about storing the results in memory until you can make sure it is successfully stored? Like, this has nothing to do with databases, that’s just application architecture. Also, what language/framework have you been using because these things are routinely handled by any that worth anything.

lmm · on Oct 7, 2021

> What does the form have to do with anything? That’s just part of an application, nothing special.

To be able to actually use database transactions you'd need them to extend across the form's lifetime. You can do that with a local application, but not with a webapp with the web request cycle.

> What about storing the results in memory until you can make sure it is successfully stored?

That has all the same problems as storing your data in memory and not using a datastore. In this scenario the data can never be stored because it now violates referential integrity.

> Also, what language/framework have you been using because these things are routinely handled by any that worth anything.

Many of the big-name Python or JVM-based ones, but it's a fundamental problem that can't be solved at that level. Which language/framework are you claiming handles this well?