Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> It will fail and be rolled back and you will lose the user's data that was written in it,

The point of relational integrity is ensuring all your data is in a valid state. If you don't want to lose a client's data because you deleted a city, there are multiple engineering approaches open to you.

1) Don't delete cities, just don't. The semantics of doing so don't make any sense in the first instance.

2) Further, deleting entities from an RDBMS is pretty damn final. What if a business analyst wanted to do some historical analysis? Well, that city is gone now, so it won't be there. This is where "soft delete" comes in (And no, soft delete doesn't destroy referential integrity) - you've marked the entity as no longer valid for use, but you still retain your data surrounding it.

3) Don't respond with a 2xx until the data store acknowledges successful insertion.

4) Use Debezium or similar to broadcast entity changes to a Kafka topic keyed on entity id. If you want to retain all changes (e.g., for event streaming), then use an appropriate retention strategy. If you want to retain only the latest state, use log compaction - it will retain the latest record with a given key.

This makes it easy for an app being spun up to obtain the entity state without touching the DB, and by continuing to subscribe to the topic, update its cache to match the source of truth.

Of course, you don't have to use Kafka, it's just one approach, but many other systems will allow you to do the same.

Basically, your objections say nothing about the usefulness of a relational datastore, but rather, the need for careful engineering when building overly distributed systems.

And through all this, I'm wondering how you imagine a non-relational datastore solving these problems any better.



> The point of relational integrity is ensuring all your data is in a valid state.

Right, and doing that at the lowest level of your datastore is a fundamentally misguided approach, because it means you can't ever store data that's in an invalid state (almost by definition). So when data does become invalid, you're effectively forced to destroy it.

> And no, soft delete doesn't destroy referential integrity

Yes it does. It destroys the property you described above - that you ensure all your data is in a valid state.

> Don't respond with a 2xx until the data store acknowledges successful insertion.

So the user sees an error. That doesn't actually help much, because what can they do with that error?

> Use Debezium or similar to broadcast entity changes to a Kafka topic keyed on entity id. If you want to retain all changes (e.g., for event streaming), then use an appropriate retention strategy. If you want to retain only the latest state, use log compaction - it will retain the latest record with a given key.

Yes, now keep tugging on the thread of that thought. If you try to write code to reconstruct the state of your relational database based on those changes, you've got two copies of your logic that will inevitably get out of sync, and you'll have bugs that you only discover when you try to actually do it. (And if you only record differences between writes that actually made it into the relational database, you haven't solved the original problem of data being lost because writes are rejected). What you want to do is instead make those Kafka events the primary "source of truth" and construct the "current state of the world" based on that, i.e. event sourcing.

You don't, and can't, make use of transactions with that approach - you get (eventual) consistency because each log is ordered, which gives you the properties that you want with less of the downsides (deadlocks), but writing and reading the eventual results of that write is a fundamentally async process (which is good in the long term - it forces you think about your dataflow and avoid loops - though it might involve more work upfront). And you don't really have relationality - if you need a relation in your live dataflow, you'll generally join at the event pipeline level and make the joined thing its own stream (much like a materialized view in SQL-land). You can build secondary indices etc. but those are explicitly a secondary thing layered on top of your primary datastore.




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: