SQL vs. NoSQL Databases: What’s the Difference?

_w3e8 · on May 12, 2016

I have issues with his "Reasons to use NoSQL"

* Making the most of cloud computing and storage

There are plenty of "cloud" hosting options for SQL databases, enough that a simple google search should make my point for me.

* Rapid development

This is a really common reason that I have heard for wanting to use NoSQL databases. It is flat out wrong.

While NoSQL databases may not force you into a schema for your database, you will pay a toll for this mutability in your application layer.

I have worked on several NoSQL backed projects, and nearly without fail development speed drops like a rock once you make changes to the way your data is persisted.

SQL migrations may seem tedious, but the process of planning your data model and having rigid definitions for your data types saves you huge amounts of time in the long run.

Data is the foundation of most applications, and if you can't rely on your data then you are building your application on sand.

luhn · on May 12, 2016

Another point on rapid development: SQL databases have some amazing libraries and tooling. I've been working a lot with SQLAlchemy in Python lately and have been blown away by it.

> While NoSQL databases may not force you into a schema for your database, you will pay a toll for this mutability in your application layer.

A while ago I read a really good point on this: There's always a schema. NoSQL / "schemaless" databases don't get rid of the schema, they just move it out of the database layer. It's still there, whether in the application or the documentation or just in your head.

wvenable · on May 12, 2016

NoSQL databases have as many schemas as there have been changes to the model. Everytime you make a change, and are keeping old data, you have another schema. Your code then has to be aware of every schema that has ever existed. Over time this becomes more and more painful and error-prone.

In a relational database, you have one schema and you have to convert/update your data to match that schema whenever it changes.

I've done one big project many years ago that used NoSQL storage of heterogeneous "document" types -- a perfectly reasonable use-case for the technology -- and I wouldn't do it again. Although at the time it seemed crazy to create different tables to hold this data it would have made that code so much simpler now.

_w3e8 · on May 12, 2016

Imagine handling data like that with javascript.

Mutable data types, with mutable data, and inconsistent error handling.

Burned me out in a matter of months.

jxcl · on May 12, 2016

I also love the "Reasons to use SQL":

> If your business is not experiencing massive growth that would require more servers and you’re only working with data that’s consistent, then there may be no reason to use a system designed to support a variety of data types and high traffic volume.

This whole article seems heavily NoSQL biased. I don't know what qualifies as high traffic volume, but Postgres has been known to handle 500,000 transactions per second, and there is stuff in the pipeline that is only going to make that faster.

http://akorotkov.github.io/blog/2016/05/09/scalability-towar...

_w3e8 · on May 12, 2016

Yeah, the title should be "SQL vs NoSQL: I only partially understand one"

asimuvPR · on May 12, 2016

Adding to your very good points:

There are very good ORMs and frameworks that make working with SQL simpler. With baked in type checking and migrations, there is not a lot of need for NoSQL. Plus PostgreSQL and it's JSON data type basically made a lot of the popular NoSQL options obsolete.

djsumdog · on May 12, 2016

Articles like this should have, in the opening paragraphs, a statement like, "This is for the general case, there are many outliers." For instance, not all non-relational databases are NoSQL. What about hierarchical databases (LDAP)?

The other thing this article totally fails to address: Why NoSQL was so problematic early on. How Mongo by default would let errors fail silent in Java and other drivers.

She doesn't address the huge limitation in many NoSQL solutions: consistency. She briefly mentions ACID, but neglects the primary use case for document based NoSQL: if your data doesn't need to be current on ever read. So a blog is a great example, or any type of social network really. It really doesn't matter if you're reading a version of the comment thread from 20 seconds ago or 5 seconds ago (as long as it's not like 2 minutes ago).

The author also classifies all NoSQL as document stores, when they're not. Mongo uses a document model/concept, but others may use a model that's more like a distributed memcache or key/value store. There are things like Apache Geode which uses a distributed object store with client based schema versioning. The data model space is pretty big and they're pretty different fundamentally.

The author mentions how a lot of solutions use both, but in other posts I've looked at and other companies I've worked for, I've found that you often have to require both. In a lot of implementations, people use both a NoSQL and a relational store; one more as a fast read cache and the other for long term persistence. Then you get into all the issues with insuring consistency between the two and it's a miss.

Overall, this is a pretty basic article and not one I'd recommend. It can lead to the wrong decision for a project.

ktamura · on May 12, 2016

>I have issues with his "Reasons to use NoSQL"

Looking at the author bio at the bottom, I believe it is "her".

_w3e8 · on May 12, 2016

[flagged]

dang · on May 12, 2016

Please don't post unsubstantive comments.

_w3e8 · on May 12, 2016

Pointing out a gender discrepancy in a thread that doesn't discuss gender at all is "unsubstantive".

Adds nothing to the conversation and is actually incorrect, as male is the default gender for pronouns.

Scarbutt · on May 12, 2016

SQL migrations may seem tedious, but the process of planning your data model and having rigid definitions for your data types saves you huge amounts of time in the long run.

Your reasoning seems too black and white, most the of the time, you can't know your data model ahead of time, that's why prototypes are so useful, so just using your language data structures or some less rigid datastore lets you get quicker to your data model.

paulmd · on May 12, 2016

Having rigid definitions for data types doesn't necessarily mean the datastore is rigid. It just means that you know what type of data goes in each column. Like it or not, if you change your schema that change has to be handled somewhere - either you update the old data to the new schema (do this), or you handle old versions of the schema in the application (fundamentally non-scalable IMO). It doesn't matter whether they're values in a JSON object or columns in a relational DB.

Flyway is a fantastic tool for making changes to relational schemas. Your code is versioned, and you can just keep the schema migration scripts right next to it. If you need to upgrade an old schema, it just walks through the migration scripts until it's up-to-date.

If there is some portion of your model that really, truly, honestly consists of unstructured data, you can just put that in a JSON column.

Scarbutt · on May 12, 2016

That is why I mentioned prototypes, they are a quick messy hack that is not meant to scale, its purpose its to validate your ideas and decisions, what is going to become your MVP, which IMO is far more important than getting the perfect stack/tooling from the start.

_w3e8 · on May 12, 2016

We obviously value very different things in regards to software development.

You really seem to favor initial velocity and rapid iteration. Fair, I can totally understand why.

I favor planning, stability, and limiting thrash due to bad/uninformed decision making.

In my experience the rush to market only pans out when you plan on having a large influx of funding to rebuild your stack. Otherwise you are stuck holding a mess of code and trying to scale ideas that never should have gone to production in the first place.

There is a whole other side to the industry that demands provably correct solutions, and will not tolerate the "hacker culture". Not because of any petty reasons like work hours or perks, but because some applications require a level of accuracy that your average growth hacker can't meet.

_w3e8 · on May 12, 2016

I also really like the sqitch project for migration management.

http://sqitch.org/

wvenable · on May 12, 2016

This sounds so logical, is repeated often, and is fundamentally incorrect. Just because you are defining a data model doesn't mean you can't change it. There's nothing easier. Undefined schemas are actually worse under change than defined ones.

prodigal_erik · on May 12, 2016

Like I've argued before, you have a schema. Your app cares about the location and format of the data. You're only deciding whether or not you want to write down your schema and use it to catch mistakes.

wvenable · on May 12, 2016

Like I've argued before, with NoSQL you have many schemas -- one for every change (unless you delete all the data after every schema change). In SQL, you have one schema -- which is much easier to work with even when you're just doing early spec development.

manigandham · on May 12, 2016

Why is this even on HN? There's no reason to debate this, it's a content marketing fluff piece for a company that has no technical pedigree, written by a freelance copywriter rather than an engineer.

It's also about 50% wrong. Let's just move on.

zer00eyz · on May 12, 2016

On some level I'm glad it is:

Things your boss or client will read, quickly turn into things your boss or client will say. I would rather be prepared for the wacky misinformed conversation that will happen, it helps keep down the laughter and frustration.

beachy · on May 12, 2016

And on that theme, then get ready for when your boss gets to the part about how NoSQL gives you:

"Rapid development. If you’re developing within two-week Agile sprints, cranking out quick iterations, or needing to make frequent updates to the data structure without a lot of downtime between versions, a relational database will slow you down."

Reminds me of this classic https://www.youtube.com/watch?v=b2F-DItXtZs

callesgg · on May 12, 2016

The article read like it is written by someone who has never understood how to use a SQL database.

To quote Mr Torvalds himself:

"Bad programmers worry about the code. Good programmers worry about data structures and their relationships."

Something that i struggled with before i learned how to use a SQL/relational database was to actualy use the database relations in my code. I first specified the relations in the database then i basicly redid them in the code.

The way i tend to program these days is to have basically all my relations restrictions and relational rules in the database. Whenever i want to change something you just alter stuff in the database GUI. Most of the application code will just follow along.

wvenable · on May 12, 2016

In my experience, if you have the relational database model fully defined and correct then the rest of the application is pretty much designed. All the screens and elements you need are obvious.

hire_charts · on May 12, 2016

This matches my experience. That's not to say that your application doesn't need to do any work on top of the data. Sometimes you need to interpret the data in different ways for different use cases, and often that means applying transformations at the application layer. But if you find yourself fighting against your own data model, it might be time to consider restructuring the data to better support your use cases.

beachy · on May 12, 2016

Indeed, but if you are building in microservices, a new level is introduced.

If you can get your service boundaries and APIs fully defined and correct, then the relational database models (underpinning each of those services) just fall from the tree for you.

And then as you say the screens and elements are obvious.

wvenable · on May 12, 2016

Honestly I'd still prefer to go other way -- from the relational database models you get your APIs defined and correct and then the screens and elements are obvious.

I find if you start at the database model layer you much less likely to screw up the API design. The database model forces a bit of rigor that you can't get going the other direction.

beachy · on May 12, 2016

IMO it's dead easy to create APIs from a database schema, where that schema underpins your monolith and you're opening said monolith up to the outside world, and you don't really care too much what goes on out there.

But if you're building a system which is not a monolith, and instead is made of up of, say, 20 services, then putting good boxes around those services and defining the lines that connect them (the APIs) is an essential prereq to designing the databases that sit inside the boxes.

(Though really what's happening, as always, is you're doing it all in parallel, just with more emphasis on the higher level stuff).

wvenable · on May 12, 2016

I can't really disagree with that. I usually design micro-services the same as do monoliths; it's just the client is different (another service).

bpicolo · on May 12, 2016

I mean, it was written by a writer, not a developer/software engineer...

easuter · on May 13, 2016

On this topic I'd like to plug a book that really drove this home for me: "SQL Antipatterns" by Bill Karwin.

hobs · on May 12, 2016

Chuckling at: Your data is structured and unchanging. If your business is not experiencing massive growth that would require more servers and you’re only working with data that’s consistent, then there may be no reason to use a system designed to support a variety of data types and high traffic volume.

SQL Databases support a variety of data types and high traffic volume, they just dont allow you to change things without thinking a little more deeply on the problem.

This article is just fluff.

hardwaresofton · on May 12, 2016

Postgres is the last SQL DB thing mentioned?

No mention of RethinkDB (https://rethinkdb.com/) ?

SQL is good if your data is "unchanging"? Schema migrations happen, SQL can be sharded and scaled to multiple servers, and the fact is, unless you're facebook or google, you probably won't even need it. Didn't reddit run postgres since it began?

Glad I skimmed this.

overcast · on May 12, 2016

Yeh seriously RethinkDB is the perfect hybrid of "NOSQL" document storage, with SQL like relationships. Been using it for all my projects the past year, and I love it.

hardwaresofton · on May 12, 2016

Same here, I have no idea why it doesn't get more press, as far as I'm concerned it's the best document store out there.

Mongo has gotten away with some pretty substantial misses, yet still get considered every time NoSQL comes up (maybe first-to-market is really just that powerful), but RethinkDB rarely gets one, despite being so stable and having so many good features.

And when I say mongo has gotten away with substantial misses, I don't mean the documentation miss that lead to documents getting dropped, I mean core issues with their engine that necessitated the wiredtiger contributions/rewrite. Of course, mongo works very well generally, and they've definitely done their part when it comes to documentation and growing the community, and carving the path for what NoSQL document stores (in this day and age) look like.

hiphipjorge · on May 12, 2016

Gotta love the subqueries and the join command! Makes everything nicer!

ktamura · on May 12, 2016

I personally think SQL v. NoSQL is less useful of the two dichotomies, the other one being OLTP v. OLAP. In the end, SQL is just an API. You can put SQL-like interface on NoSQL as well (like SlamData for MongoDB or Crate for Elasticsearch).

On the other hand, OLTP v. OLAP (transactional v. analytic) is directly tied to use cases and has more profound consequences as far as technical decisions go (data structures, algorithms, etc.)

_0w8t · on May 12, 2016

This is very good point. For example, MongoDB and Oracle are not that different compared with Splunk.

nthacker · on May 12, 2016

This talk by Martin Fowler is a fantastic overview into the SQL vs NoSQL discussion : https://www.youtube.com/watch?v=qI_g07C_Q5I

jcyw · on May 12, 2016

There should be mentions about CAP theorem, and tradeoffs about consistency vs availability. Sometimes you dont care as much about CAP-consistency than availability or scalability.

lowbloodsugar · on May 13, 2016

For me, the most important difference is that a SQL "Database" is actually an incredibly powerful application server, with a built in compiler for a data processing language, and many years of development on access strategies, whereas a NoSQL database is just a level above an http server with an index that scales really well. That's a bit of an exaggeration, but I find it helps new teams understand the big picture. In a microservice environment, I encourage my teams to think of their SQL database server as a microservice, not a persistence store.

0x434D53 · on May 12, 2016

That happens, when a content marketer write about a technical topic... Just sad.

BinaryIdiot · on May 12, 2016

Many databases considered NoSQL actually have SQL now. Cassandra for instance. Granted it's not as critical to the execution as SQL is for RDMS but I would much prefer the distinction to be: RDMS / Relational versus KeyValue / Document Stores. But I digress.

Some of the other criticisms in this thread are very valid especially in the area of why you should use one versus the other so I won't bother regurgitate them I just felt like a better distinction could have been used between what they were comparing.

Florin_Andrei · on May 12, 2016

> NoSQL data doesn’t need to be prepped ahead of time.

Unless you're using CouchDB and you need to change some views, and then you have to re-index the whole thing, which can take forever.

aristus · on May 12, 2016

Bah. People keep confusing the syntax with the data model. SQL is simply a notation to express theorems in relational set theory. It's pretty much the only pure-functional, side-effect-free programming language that's actually gone mainstream [0]. If you look carefully at the APIs of the "No SQL" databases, you'll find them introducing relational set theoretic features.

To paraphrase Paul Graham's unbearably smug comment about Lisp: once you add filter, group by, and join, you can no longer claim to have invented a new query language, just another dialect of SQL. Math has this funny habit of remaining true no matter how many mean things people say about it on their blogs.

[0] I mean basic core SQL, not the vendor-specific stuff like PL/SQL.

gshulegaard · on May 12, 2016

This is interesting, but I admit I am a bit confused.

Having read "An Introduction to Database Systems" by C.J. Date (http://www.amazon.com/Introduction-Database-Systems-8th/dp/0...), I have a tremendous appreciation for Relational Algebra and Theory which allows for the closed functional nature/syntax of SQL which holds true across the various normal forms (1st-5th).

Perhaps due to my own lack of understanding, I always assumed that without an underlaying relational model SQL loses its closure. So for me, I always thought the distinction between SQL and NoSQL was whether or not the underlying database model was relational.

Sure for many of the queries against a non-relational model you could adopt a SQL-like syntax (which many languages seek to do), but it would no longer be operating in the closed Relational Algebra space.

Could you comment on this and help me understand the finer points/where I went wrong?

btilly · on May 12, 2016

Syntax and being relational are orthogonal.

You can set up non-relational data structures in SQL. You can set up a relational data structure in virtually any programming language.

However what the NoSQL people are learning is that smart developers really want relational models. So they are offering the features that can be used to achieve that.

gshulegaard · on May 12, 2016

That's where I think where most NoSQL proponents start to miss the value of relational models and the closed Relational Algebra it allows.

As I understood it, SQL was a language for describing the Relational Algebra which was a closed mathematical concept.

You can take the syntax, assign functions to produce output from a non-relational model, but it's no longer SQL...and it's underlying power (the closed Relational Algebra) is no longer present.

So in a sense syntax and being relational are orthogonal, but syntax isn't the defining characteristic of SQL (or why it's so powerful). I think a more insightful statement is: SQL (Relational Algebra) and being relational are codependent and relational and non-relational are orthogonal.

Which, if I am not mistaken (and again I am still trying to understand the finer points which is what motivated my original comment), NoSQL people fundamentally _cannot_ offer a query language similar to SQL based on the closed relational algebra.

They can mimic the syntax to make it more intuitive for new users coming from traditional RDMBS environments, but it's fundamentally _not_ SQL/relational algebra.

But again, I am just a cursory admirer/learner and I am still trying to iron out my own understanding of the underlying theory. So all additional information/clarifications welcome :)

mindcrime · on May 12, 2016

Along those same lines, you can - in principle - have a relational database that doesn't use SQL (although there aren't many in practice), and plenty of "NoSQL" databases use SQL (or a VERY SQL-like language). You can layer SQL on top of a lot of different data models, although the mapping will be cleaner in some cases than others.

aristus · on May 12, 2016

Absolutely. But at that point we are quibbling over syntax instead of semantics, which is not that interesting. In practice you can do hierarchical models in SQL just fine (cf Oracle's ANCESTOR features, or AsterData's regexps), and bag-of-features stuff using JSON. Graph models are more awkward, but graph models are always awkward.

lanna · on May 12, 2016

> side-effect-free

How is SQL side-effect-free if it has INSERT, UPDATE, DELETE?

wvenable · on May 12, 2016

Those aren't side-effects, they are the main effects.

jimbokun · on May 12, 2016

NoSQL DBs became popular because performant distributed joins are hard.

Joins (and aggregate, group by, etc.) were thrown out to make horizontal scale out simpler. Cassandra CQL, for example, shares keywords with SQL, but the key-value(ish) data model couldn't be more different, and is much easier to scale horizontally.

Various systems are trying to solve the distributed join problem while keeping easy horizontal scaling, and if they succeed the NoSQL label won't make sense for those systems any more.

aristus · on May 12, 2016

Yep. I spent 2.5 years at MemSQL, which nailed its colors to the mast of solving distributed joins via good old SQL. It is absolutely possible to have fast scaled distributed SQL, and a pony. The caveats are falling away one by one.

yamadapc · on May 12, 2016

SQL...?

systems · on May 12, 2016

not really, sql is usually used to refer to relational databases (and yes i am aware that sql is not fully relational)

to sql vs nosql

is more like relational vs other models

nosql can also be seen as domain specific databases some nosql options work great for specific types of apps but not others

and to conclude, the key feature of relational databases is integrity, your data is normalized, so you are guaranteed integrity, problem is, integrity is not free, this why we need nosql and other bi solutions like olap cubes

but olap cubes are different, since they dont replace relational databases, they complement each other ... olap can also be seen as domain specific

Mister_Snuggles · on May 12, 2016

I'm always saddened when I don't see MultiValue[0] databases in the list, but the reality is that there's almost nothing that's open source in this space.

My first real job was doing development on a custom application built entirely in UniVerse[1]. It was a really interesting way to work and mirrors a lot of what the Open Source NoSQL databases do, though it does it differently. Instead of storing the schema with the records (e.g., a JSON document), a record was simply a list of data elements (fields), each field could have multiple values (or not), each value could have multiple subvalues (or not), and there was a lower level that you could go to in order to store multi-line text data.

There was a dictionary attached to each file that primarily drove the query tool. You could also build calculated fields (I-descriptors) to do primitive joins, among other things. If you had a file containing invoices, you might have two fields to list the item number and quantity (these would be repeated for as many items as are on the invoice). You could then use an I-descriptor to follow item number over to the items file and pull the description, price, etc.

The entirety of the application ran within UniVerse. When users logged in to the unix server, their shell would run a script that would set up the environment and fire up UniVerse. The UniVerse LOGIN paragraph would then show a menu system and allow them to do their thing. Everything was written as either a paragraph, roughly equivalent to a shell script that ran UniVerse commands, or a BASIC program. The dialect of BASIC natively understood dynamic arrays, which were a 1:1 match to the structure of the records on disk.

UniVerse also had a built-in way to execute SQL queries against the multivalued data. In the invoice example above, the item number and quantity would be associated in the dictionary. These would then be exposed as another table by the SQL engine. You could go "SELECT A.INVOICENO, B.ITEMID, C.DESCRIPTION, B.QTY FROM INVOICES A, INVOICES_ITEMS B, ITEMS C WHERE A.@ID = B.@ID and B.ITEMID = C.ITEMID" and it would know that INVOICES_ITEMS is really part of the data in INVOICES.

It was a really cool system and I wish there was something like it in the Open Source world. Sadly, development on MaVerick[2] seems to have stalled.

[0] https://en.wikipedia.org/wiki/MultiValue

[1] http://www.rocketsoftware.com/products/rocket-universe

[2] http://www.maverick-dbms.org/

neo2001 · on May 12, 2016

Grabs popcorn