> But you have a DB that everyone knows. 3rd party tools work. Developer on-boar...

marcos100 · on Oct 6, 2021

If you're throwing everything the customer have input then you have a poor written application.

Not inserting bad data is much better than trying to make sense of inconsistent data because you don't know what is true or not.

lmm · on Oct 7, 2021

Validating consumer input is good. Throwing it on the floor when it fails validation is bad. You want to separate "accept the consumer's input" from "validate it", which is something that's very hard to do in RDBMSes.

EdwardDiego · on Oct 6, 2021

> Only because you drop data on the floor if it's inconsistent, which is rarely a good enough solution in the real world. E.g. someone tries to add a customer but the city has been deleted in the meantime -> they get an error, and the system throws away everything they input about the new customer. You end up having to implement your own dead letter queues etc. and do all the work you would with a NoSQL datastore.

What. Who's letting people delete random shit out of the DB? Why is the origin of this data not validating against the actual data from the DB before sending data?

"Hey we have to ensure that customers are selecting valid cities.

We have a list of cities in the DB. Let's ensure we cache the cities in the api server so that we're not hitting the DB for every page load for data that will very rarely, if ever, change.

Especially since we revoked Dave's DELETE permissions after that last time he deleted stuff from a production database."

lmm · on Oct 6, 2021

> Why is the origin of this data not validating against the actual data from the DB before sending data?

It's on the client side, so it can't hold an open transaction with the server. It can validate against a previous version of the list of cities, but there's always the possibility of a delete happening in parallel.

(And if your answer is to use a delete flag rather than actually deleting a row, then you're undermining the referential integrity constraints that were supposed to be the whole point of using a relational database)

EdwardDiego · on Oct 6, 2021

>It can validate against a previous version of the list of cities, but there's always the possibility of a delete happening in parallel.

If a "city" is a) required attribute of an entity and b) required to exist as an entity itself, then deleting a city would require deleting all other entities that refer to that city.

If city is a nullable attribute, then your validation issue you raised earlier is voided - an invalid "city" can just be replaced with null.

Assuming you're wanting to delete cities, then your client side shouldn't be firing off data and assuming success.

> It can validate against a previous version of the list of cities, but there's always the possibility of a delete happening in parallel.

Yep, caching is hard, but your argument that "RDBMS aren't that great" is predicated on a very... ...convoluted system.

lmm · on Oct 6, 2021

> If a "city" is a) required attribute of an entity and b) required to exist as an entity itself, then deleting a city would require deleting all other entities that refer to that city.

Right, so usually what people implement (and what the database enforces by default) is that you can't delete a city if there are any customers in that city. But they don't think about what happens when two people try to make changes in parallel, because the whole point of ACID is that each query runs in its own transaction.

> Yep, caching is hard, but your argument that "RDBMS aren't that great" is predicated on a very... ...convoluted system.

All the use cases for referential integrity - and transactionality - are like this though. You can't actually use them for anything, not with a web app where the client can't hold a transaction open.

kaba0 · on Oct 6, 2021

> But they don't think about what happens when two people try to make changes in parallel, because the whole point of ACID is that each query runs in its own transaction.

You do know that the D in ACID stands for consistency? That is in case a transaction try to modify something that was already changed in a parallel transaction that was already committed, it will fail and will be rolled back (given proper transaction levels).

So I really don’t understand your point. Also, you can hold a transaction open for a client..

lmm · on Oct 7, 2021

> That is in case a transaction try to modify something that was already changed in a parallel transaction that was already committed, it will fail and will be rolled back (given proper transaction levels).

It will fail and be rolled back and you will lose the user's data that was written in it, unless you effectively build your own datastore on top of what the database offers.

> Also, you can hold a transaction open for a client..

For a client that's potentially gone offline for an indefinite period? Certainly not. Even if you're heartbeating that the client still has the form open somehow, that seems like a very suspect architecture.

EdwardDiego · on Oct 7, 2021

> It will fail and be rolled back and you will lose the user's data that was written in it,

The point of relational integrity is ensuring all your data is in a valid state. If you don't want to lose a client's data because you deleted a city, there are multiple engineering approaches open to you.

1) Don't delete cities, just don't. The semantics of doing so don't make any sense in the first instance.

2) Further, deleting entities from an RDBMS is pretty damn final. What if a business analyst wanted to do some historical analysis? Well, that city is gone now, so it won't be there. This is where "soft delete" comes in (And no, soft delete doesn't destroy referential integrity) - you've marked the entity as no longer valid for use, but you still retain your data surrounding it.

3) Don't respond with a 2xx until the data store acknowledges successful insertion.

4) Use Debezium or similar to broadcast entity changes to a Kafka topic keyed on entity id. If you want to retain all changes (e.g., for event streaming), then use an appropriate retention strategy. If you want to retain only the latest state, use log compaction - it will retain the latest record with a given key.

This makes it easy for an app being spun up to obtain the entity state without touching the DB, and by continuing to subscribe to the topic, update its cache to match the source of truth.

Of course, you don't have to use Kafka, it's just one approach, but many other systems will allow you to do the same.

Basically, your objections say nothing about the usefulness of a relational datastore, but rather, the need for careful engineering when building overly distributed systems.

And through all this, I'm wondering how you imagine a non-relational datastore solving these problems any better.

lmm · on Oct 8, 2021

> The point of relational integrity is ensuring all your data is in a valid state.

Right, and doing that at the lowest level of your datastore is a fundamentally misguided approach, because it means you can't ever store data that's in an invalid state (almost by definition). So when data does become invalid, you're effectively forced to destroy it.

> And no, soft delete doesn't destroy referential integrity

Yes it does. It destroys the property you described above - that you ensure all your data is in a valid state.

> Don't respond with a 2xx until the data store acknowledges successful insertion.

So the user sees an error. That doesn't actually help much, because what can they do with that error?

> Use Debezium or similar to broadcast entity changes to a Kafka topic keyed on entity id. If you want to retain all changes (e.g., for event streaming), then use an appropriate retention strategy. If you want to retain only the latest state, use log compaction - it will retain the latest record with a given key.

Yes, now keep tugging on the thread of that thought. If you try to write code to reconstruct the state of your relational database based on those changes, you've got two copies of your logic that will inevitably get out of sync, and you'll have bugs that you only discover when you try to actually do it. (And if you only record differences between writes that actually made it into the relational database, you haven't solved the original problem of data being lost because writes are rejected). What you want to do is instead make those Kafka events the primary "source of truth" and construct the "current state of the world" based on that, i.e. event sourcing.

You don't, and can't, make use of transactions with that approach - you get (eventual) consistency because each log is ordered, which gives you the properties that you want with less of the downsides (deadlocks), but writing and reading the eventual results of that write is a fundamentally async process (which is good in the long term - it forces you think about your dataflow and avoid loops - though it might involve more work upfront). And you don't really have relationality - if you need a relation in your live dataflow, you'll generally join at the event pipeline level and make the joined thing its own stream (much like a materialized view in SQL-land). You can build secondary indices etc. but those are explicitly a secondary thing layered on top of your primary datastore.

kaba0 · on Oct 7, 2021

What does the form have to do with anything? That’s just part of an application, nothing special. What about storing the results in memory until you can make sure it is successfully stored? Like, this has nothing to do with databases, that’s just application architecture. Also, what language/framework have you been using because these things are routinely handled by any that worth anything.

lmm · on Oct 7, 2021

> What does the form have to do with anything? That’s just part of an application, nothing special.

To be able to actually use database transactions you'd need them to extend across the form's lifetime. You can do that with a local application, but not with a webapp with the web request cycle.

> What about storing the results in memory until you can make sure it is successfully stored?

That has all the same problems as storing your data in memory and not using a datastore. In this scenario the data can never be stored because it now violates referential integrity.

> Also, what language/framework have you been using because these things are routinely handled by any that worth anything.

Many of the big-name Python or JVM-based ones, but it's a fundamental problem that can't be solved at that level. Which language/framework are you claiming handles this well?

EdwardDiego · on Oct 7, 2021

> All the use cases for referential integrity - and transactionality - are like this though. You can't actually use them for anything, not with a web app where the client can't hold a transaction open.

You should probably tell the millions of companies actually using them for things that they're wrong. I'm sure they'll feel very silly.

lmm · on Oct 8, 2021

The actual businesses generally don't. It's funny how RDBMS fans always say "businesses x, y and z were built on MySQL" and forget to mention that they were actually using MyISAM-based versions of MySQL that had no ACID and barely any referential integrity constraints. Most of the "mongodb is web scale" crowd is coming from the same ideology as the people who were hating on MySQL 10 years ago.