The problem we have is that we're trying to store JSON provided by the user. Meaning that we need to persist empty strings provided by them. The only way I can see around this issue is to replace all empty strings with a special string token on the way in, and replace all those special string tokens on the way out.
See that's the best use case I've seen in this entire discussion. If you don't get to control if blank vs null is significant then I can see how you'd have a real issue on your hand.
I'm guessing you have to have it in a form you can search it (so you can't just GZIP it or something like that)?
Using a printable character (or sequence of characters) seems like it would be a lot better. It'll ugly up your database, but at least it'll be obvious that there's a difference. Having all your string possess two forms which are visually identical but are not actually the same sounds unpleasant.
I think it would be better to use something both printable and commonly used for placeholders, eg. an underscore, so it's obvious if you forget to remove it and unlikely to seriously confuse anybody.
What if the user has a string ending with a meaningful zero width space already stored? For example, the string could be checksummed somewhere. It would corrupt their data.
If you want a kludge for this, it's better to generate a longish random string (e.g., a UUID) to indicate an empty value.
When you get a string from the client you prepend a single zero width space. When you send a string back you strip the single leading space you added. The client will always have the exact same data back that they sent originally.
You're right, of course. Sorry, I wasn't clear. I meant that a user might have stored a string with a zero width space at the end by the time you introduce this escaping mechanism. (I've already edited the comment to indicate this.) The same goes double if you append a common printable character. You'd have to rely on some additional indicator, such as the date and time the record with the string was stored, to know whether to unescape a string and also be sure nothing changed those date and time without escaping the data.
Oh, yeah, in that case I'd either if-case it by time stamp or I'd prepend a zero width space to all historical data as well. I would prefer doing the latter and would only do the former if there was some reason I couldn't do the latter, for example if I had too much historical data to able to process it (though I have a hard time imagining that happening for something so trivial as prepending a zero width space, unlike say converting thousands of hours of video which might actually be too time-consuming or computationally expensive).
One issue that might arise with altering historical data that I can imagine would be if it was ever necessary to restore from backup and your backup was made before you later added the zero width space, and then you forget to add the zero width space again when you restore from backup a few months down the road. But with proper documentation and procedures that shouldn't happen.
string_to_store = userstring + extra space
dynamo.store(key, string_to_store)
...
stored_string = dynamo.retrieve(key)
user_string = stored_string - extra space
That way the user puts a string in and gets the same string out. No problem.
While not ideal, you can create a new AttributeTransform that sets a place holder when storing into DynamoDB and removes it when pulling out of DynamoDB as part of your DynamoDBMapper instantiation.
I did this to convert some String Sets (SS) in my database to String Lists (L). I almost did this same thing to fix the empty String issue but didn't have the time to implement it yet.
Unfortunately I'm working in a node environment. The Java SDK for working with DynamoDB seems much nicer to work with. For example I believe that you can do transactions with the DynamoDB interface for Java.
Sorry I saw the link was referencing the Java SDK so I thought you were using the same. The DynamoDBMapper Java SDK has been an easy to use and readable ORM for me. Adding annotations to define keys and attributes has been great. That said I'm not sure if JavaScript has an equivalent to the AttributeTransformer I mentioned.