There's at least one factor which I don't see consideration for here.
Under GDPR, CCPA/CCRA, and a number of other privacy laws, if you're going to retain data related to people you need to provide those people some way of retrieving that data or requesting that it be erased. Putting the data into a "deleted_record" table doesn't remove that obligation.
So, if you're going to have a bunch of "deleted" records hanging around, you need some way to figure out who they belong to. And I don't see any way to do that with this schema, short of rehydrating all the rows and following the original foreign key relations.
That assumes that all of a user's data is a single foreign key relationship away from the user, which often isn't the case.
It also requires you to scan the entire table and decode JSON for every row you've ever deleted, which may become prohibitively expensive for large databases.
PostgreSQL can create indexes on the results of expressions, so it's possible to accelerate queries that select against a nested value inside a JSON blob.
To give an example for your 1st point: If you have the schema users->messages->message_attachments and message_attachments does not store the user_id, you'll have trouble finding all attachments from a specific user.
This is an area where technologists and lawyers will end up disagreeing and fighting about boundaries etc.
Does it still count as your data if there is no normal way to retrieve/access it in the software?
If you say "yes", here's what this implies: if you have deleted the data, but it's still on the disk because the drive heads haven't wiped it yet (it's just been deallocated), then it's still accessible.
So, whenever there's a GDPR request, you should run disk recovery software? (The answer is no; you'd have to butt up against pretty thick lawyers and judges to be fined for this)
If you have an audit table that is automatically deleted after a while, and the audit table cannot be accessed as part of normal operations, then IMO you will be able to argue that it's not part of data that should be "reasonably accessed" via GDPR.
When you look at the spirit of the law, it also does make sense (disclaimer: I am a HUGE proponent of GDPR). What matters is that users have access to the data that the company has access to, and is able to correct and delete it. If the data is not normally accessible, and will soon be deleted, then it doesn't matter.
Here's a simple test: would the data be turned up in legal discovery? Nobody is doing a sector level disk scan but you would be expected to turn over relevant audit data if you had retained it. So it needs to be accurate and GDPR applies.
Not really. You actually may have other legal requirements to keep the data but you shouldn't use it for daily business because of GDPR. Imagine banks and long closed accounts.
Moreover, technical backup solutions, where only a very limited set of people have access, are fine. If you store DB backups, you don't have to rewrite or delete them because a customer that used your service for a week decided to ask for deletion under GDPR.
That's a different part of GDPR: purpose limits. Deleted or not, you can only use data for an approved consented purpose that is mapped to that data item.
Something I’ve been meaning to play with is setting a “PII” comment on a table/field, then scanning over the schema if I need to find places to check for encryption etc
Perhaps something here that could mark it with a uuid so if it needed to be fully deleted, it could be found easily and removed or overwritten
Under GDPR, CCPA/CCRA, and a number of other privacy laws, if you're going to retain data related to people you need to provide those people some way of retrieving that data or requesting that it be erased. Putting the data into a "deleted_record" table doesn't remove that obligation.
So, if you're going to have a bunch of "deleted" records hanging around, you need some way to figure out who they belong to. And I don't see any way to do that with this schema, short of rehydrating all the rows and following the original foreign key relations.