Ok, old man yelling at clouds moment finally coming for me. Now that we've been through the document-database heyday and are out the other end, what have we learned about where document databases are a good fit?
At the time I looked at them like a fad. "These script kiddies want to write javascript and ignore schemas. Let's see how well that works out for them." As expected, most of what I ever hear is regret.
Today, MongoDB has grown out of the reliability issues it had in the past, and Postgres has json features for the occasional times it's useful to store some loosely structured data along with otherwise relational data. Question is, what applications is a document-first database good for, outside of prototyping?
Edit: and to make sure I understand, FerretDB is a layer reimplementing MongoDB on top of Postgres and its json features?
Yes, FerretDB is a layer which implements the MongoDB wire protocol on top of Postgres. Right now we are using JSONB, but this affects performance and we need to depart from this strategy in the long run.
We have an article which explains the concept [1].
I wouldn't go into the document vs. relational argument, all arguments for and against would have merit. There are valid use cases for document databases (take e-commerce, for example), and we should not discount the fact that using a relational database is just more complicated. Using vanilla Postgres for a MongoDB use case will not be feasible for someone who's focus is, let's say, mobile application development. There is a reason behind MongoDB's popularity - it just provides a great developer experience. This is what we are aiming to recreate on top of Postgres.
if we use Ferret for prototyping and eventually land on a schema, would it be possible to convert the FerretDB rows into more structured postgres? Would be so cool if it could just analyze and create a schema for you, you double check it, and it just works.
I think being able to convert back and forth would make it so worthwhile!
1. Denormalized data is often a boon. One of your key use cases is "give me all the information about this Product so I can show it to the user." Forget joining a ton of tables, just do 1 read of 1 product and go on your way.
2. A certain amount of more freeform data is expected. Different product categories will have different sets of information. T-shirts and drinkware both have sizes, but these sizes have nothing to do with each other. You can model all this in a traditional SQL database, but you have to stop and think really hard about it, and potentially end up with a plethora of tables. In a document database, it's much easier to just add the data and go on your way.
Probably a custom PostgreSQL extension for BSON support.
The biggest issue right now is that MongoDB compares and sorts values differently than PostgreSQL/jsonb. For that reason, we have to do a lot of filtering on the FerretDB side, and that can't be great for performance. Pushing more work on the database size should make FerretDB perform much better.
> Question is, what applications is a document-first database good for, outside of prototyping?
Some criteria I’d use:
* Data is already naturally segregated and won’t be shared/joined much during normal usage
* App already has transactions modeled in the application layer (or I guess if consistency really doesn’t matter)
* App would benefit from being geographically distributed.
* App is written in a language with a strong type system, has high code quality, test coverage, etc.
In my case, my company makes a distributed project management application. All changes to projects are canonically ordered and applied in the application layer.
Data is stored in Cloudflare Durable Objects and R2. Hard to classify DOs but they’re a lot closer to a document store than an SQL Db.
There are some nice benefits that would be hard to replicate with a traditional SQL Db setup.
A use case I have for FerretDB is migrating existing apps off MongoDB without needing to change the code. I find it funny that you could call these now „legacy“ apps.
Exactly. Let's say a major company have a few applications which require MongoDB, and the rest of their applications are running on Postgres.
With FerretDB, they can migrate the app off MongoDB as you said, and keep Postgres only. Therefore they don't need to maintain internal knowledge on how to run MongoDB, or pay for MongoDB to run it for them (in which case they are not in control of their data, because it is all under MongoDB's account...).
I'm a contributor to the Sandstorm.io project, and Mongo continues to present a bit of a pickle for us. We can arguably write our way out of one upgrade, but upgrading to a later Mongo just punts the issue again. We'd much rather just leave Mongo behind.
The core of the issue is Mongo does not seem intended to be upgraded reliably without intervention. Sandstorm is running on thousands of servers where the admins aren't equipped to handle Mongo upgrade issues, as well as within some Sandstorm apps which also use Mongo inside containers not intended to be user servicable.
One of the issues we hit is here: https://github.com/meteor/meteor/issues/11666 in which if you happened to have a Mongo database over eight years old (many Sandstorm servers have been deployed for that long!), you needed manual intervention to correct it, even if you had done intermediate version updates in between.
Meteor patched around this issue... but after dropping support for several releases of Mongo. So we essentially need to build our own automation which understands and can export old Mongo databases, and then import new Mongo databases, while shipping a Meteor app that can only run on one or the other, which has to auto-update smoothly, and recover from failure like if there isn't hard drive space to handle the process.
And then we also need to implement that within app sandboxes which can also arbitrarily terminate so that also has to recover well and we need to ship the logic to do this with every Mongo-backed app package until the end of time.
Looks that way to me... I really liked a LOT about MongoDB, I don't think they had a good story for administration of scale + redundancy. RethinkDB, Cassandra and others have a ring + redundancy model which I think is easier to deal with. Where Mongo at least was limited to replication or sharding, and was a serious pain to deal with from the admin side imo.
Understanding the advantages and shortfalls is sometimes harder too. Mongo having multiple secondary indexes is pretty nice as well. Haven't dug into FerretDB, my first question is if it will run over the top of CockroachDB, as that's how I would probably want it configured. Then, I'm not sure if I wouldn't just use the JSONB surface in PostgreSQL/CockroachDB directly.
I think it just really depends on how/what you need to accomplish.
At the time I looked at them like a fad. "These script kiddies want to write javascript and ignore schemas. Let's see how well that works out for them." As expected, most of what I ever hear is regret.
Today, MongoDB has grown out of the reliability issues it had in the past, and Postgres has json features for the occasional times it's useful to store some loosely structured data along with otherwise relational data. Question is, what applications is a document-first database good for, outside of prototyping?
Edit: and to make sure I understand, FerretDB is a layer reimplementing MongoDB on top of Postgres and its json features?