Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

If you aren't normalizing, how are you ensuring that you don't avoid anomalies?


Proper NoSQL design has a different perspective, it asks you "well, so what if you have an anomaly?". One example could be movies, with actors, producers, genres, etc. that could all be in separate tables in a relational database, or each movie could be a document in a document database. Now let's imagine that an actor changes their name a few years after the movie is released. Is it important to go back and change the name of the actor in each of the movies? Maybe, maybe not. Certainly you can't change the credits inside the movie itself. Maybe it's sufficient to just have a link to the actor's page, where the actor's up-to-date name is. Maybe it's insufficient if the actor will get upset that their name wasn't updated across their old movies.

You choose relational models when anomalies are unacceptable and non-relational models when anomalies are acceptable.


That makes sense. I tend to go away from non-relational models for this reason, but it's definitely a matter of risk management.


Thanks! That is a very useful example.


Not sure if this is exactly what you're referring to, but my understanding is that picking the "right" schema for a document database to ensure that you don't end up with slower queries like mentioned elsewhere in the thread tends to benefit from thinking at a somewhat lower level of granularity than you would probably need to with a relational database. Instead of just identifying "one to one", "one to many", "many to many", it's useful to ask questions like "how 'many' is many"; as a simple example, if the "many" in "one to many" is on the order of 10, maybe it makes sense to embed them in an array rather than use a separate collection for them. It can also help to start from thinking about the types of queries you might want and then designing the schema based on them rather than starting by deciding on the schema and then having the queries be based on that; if you're going to want certain data to be accessed at the same time, you're probably going to find some way to store it together.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: