The Troublesome Active Record Pattern

buffington · on March 19, 2020

With all due respect, Cal Paterson doesn't seem to know what he's talking about.

The Ruby example he uses as the "The Troublesome Active Record Pattern" simply bizarre. You'd have to deliberately avoid reading the documentation for ActiveRecord to produce that code.

In other words, if you write terrible code, you'll get terrible results.

In actual fact, when you read the docs and actually use ActiveRecord as it was intended you can do exactly what he says it doesn't. Here's what that looks like:

  Book.joins(:author).where(authors: {nationality: "Italian", state: "deceased"})

That produces a SQL query that's nearly verbatim what he claims ActiveRecord can't do.

calpaterson · on March 19, 2020

Hi, Cal Paterson himself here. It's not a Ruby example, it's a Python example. Don't look too deeply into the code samples for specifics though, they are a pastiche.

You are not the only person to get the wrong end of the stick re:"Active Record". I'm talking about the Active Record the design pattern, not about the Ruby library that took the pattern as it's name but now is apparently a querybuilder library. I probably need to go back and make it clearer somehow.

EDIT: I've done this now. Back to stockpiling tinned food and bullets!

sanderjd · on March 19, 2020

The reason there is confusion is that you're debating a straw man. The reason none of the popular "Active Record" libraries (like Ruby's ActiveRecord) actually have the problems you're identifying is that none of them implement a pure version of the design pattern, because doing so leads to the problems you're identifying. So you're both right that the pattern doesn't really work, and wrong that it is a major problem in practice.

dkarl · on March 19, 2020

In his defense, I'll say that at many of the places I've worked, there have been vocal programmers who regarded writing SQL as a dumb graybeard party trick with the same relevance to modern software engineering as hand-crafting a binary. These are the people who, when their ORM generates an inefficient query, trumpet it as proof that the database was designed wrong, probably by people who wish their SQL skills weren't so hopelessly obsolete in the modern age.

I see this much less now than I did 10-20 years ago, so perhaps it has gone out of style, or perhaps it is a disease of large companies (which I haven't worked at in a decade.)

rapind · on March 19, 2020

I think if there were better SQL tooling you'd see more people embrace it. It boggles me that given the age of SQL it hasn't gotten better tooling.

For example I'd love to move more application logic into postgresql functions, but it's such a pita to run and test I end up procrastinating.

I've been playing with postgREST for a while now and have yet to implement it outside of toy projects. Not because of postgREST itself (which is really cool) but because writing business logic in postgresql is so painful.

Annoying because most CRUD apps could live almost completely in the database exposing views and functions via REST to the end user with w/e flavor of the month UI you like. Oh and they'd run orders of magnitude faster.

I'm not even complaining about SQL syntax, which IMO is actually pretty clean and functional. Just the lack of decent tooling to get shit done.

calpaterson · on March 19, 2020

I'm afraid it is a major problem in practice - I'm didn't write the post in the abstract I wrote it because this problem comes up over and over. If you've never seen it then lucky you! :)

Personally the conclusion I'm coming to is any interface that presents a table as an class is just the wrong affordance and likely to lead to problems down the line. If programmers modeled tables as sets instead of objects we wouldn't have all these problem apps that page the database into program memory while serving a web request.

aurisor · on March 19, 2020

It is a major problem in practice. It's called the "N+1" problem and it's exhaustively studied and documented, with mitigation strategies for every ORM that encounters it.

This is a bit like acting as if you're the first person to realize that null pointer deferences are a problem and happily telling the entire world to "just switch to pass by value."

calpaterson · on March 19, 2020

You aren't right about that - the N+1 problem is just one specific issue WRT joins and while the example I give does have it that is not the only thing wrong with it. The rot is more general sadly.

aurisor · on March 19, 2020

> ISBNs of all deceased Italian authors

  Book.joins(:author).where(author: { deceased: true }).map &:isbn`

> first class transactions / decrement race condition

  Book.find(isbn: 1).decrement :copies_in_stock

> Row-locking

  Book.transaction do
    b = Book.lock("FOR UPDATE NOWAIT").find_(1)
    # logic around handling out of stock
    b.copies_in_stock -= 1
    b.save!
  end

Those are all of your examples and the Ruby code that implements the solution you want. (Disclaimer: there are probably small syntax errors here, you get the gist)

Honestly this article sounds like it was written by someone who read a 10 year old Martin Fowler article and decided to critique it without even a cursory understanding of modern ORMs. Even more strangely, when everyone starts pointing out the obvious fact that ORMs do solve these problems in practice, you've decided to move the goalposts to say "oh, well the straw man abstract pattern doesn't solve for these real-world problems."

In the future I'd recommend you familiarize yourself with a problem space before trying to publish on it.

sanderjd · on March 19, 2020

I think we are using "problem" in two different ways that are each reasonable but are not the same: 1. something that happens and needs to be fixed, and 2. something that is fundamental and difficult to overcome. In my comment up-thread, I meant more the #2 definition, but you clearly mean the #1 definition.

I spent a number of years working on a Rails application that used ActiveRecord and which accrued quite a large set of data. In that job, I fixed a whole bunch of N+1 problems, and wrote sql queries with complex joins before the nicer query-builder arel syntax was implemented. I was a pretty vocal critic of AR at my company because of these sorts of headaches (though I was much happier post-arel). But these were all very surmountable problems, the libraries had the capabilities necessary to fix the problems, and as our team matured in our intuition for our data model and size, we ran into these things less and less. In other words, I see this as very much a #1 type of problem rather than a #2 type of problem.

A good analogy seems to be n^2 algorithms. In much the same way, you can write one of those and it works fine for awhile, but then one day you may find that your data has grown enough that it no longer works, and it may be very mysterious what happened. But with some experience and work, you can figure it out and probably fix it. And as you gain experience, you can more wisely decide whether to avoid the n^2 algorithm from the start, or whether it will work fine for your data size.

cryptonector · on March 19, 2020

Allow me to chime in and agree with you that in real life there exist real applications using ORMs that have or cause these problems even where the ORMs in question provide modern query facilities that don't suck. Whether this happens because the ORMs were not that advanced when the apps were developed, or the devs were not that knowledgeable, doesn't matter -- the apps exist, and I know because I've had to deal with some such.

I agree very much with the idea that the problem is fundamentally caused by wanting to present objects to apps, and that dealing with queries instead is better and the real solution. Of course, the queries can't be SQL -- we can't have that! So we end up with N+1 query languages: SQL and N more non-SQLs. But N+1 query languages is a lot better than no query languages as in earlier ORMs.

robocat · on March 19, 2020

SQL has hideous security issues with SQL injection using strings, and ActiveRecord avoids that, so SQL is bad and ActiveRecord is good. My statement is just as true and just as disingenuous as Cal’s article.

> in real life there exist real applications using ORMs that have or cause these problems even where the ORMs in question provide modern query facilities that don't suck

And surely you have also seen plenty of equivalent examples of bad code written using SQL?

Cal is attacking an imaginary ActiveRecord pattern, while ignoring the features of the most famous implementation (RoR), and ignoring the benefits of ActiveRecord. Dumb.

cryptonector · on March 19, 2020

> SQL has hideous security issues with SQL injection using strings, and ActiveRecord avoids that, so SQL is bad and ActiveRecord is good. My statement is just as true and just as disingenuous as Cal’s article.

Most SQL RDBMSes have proper parametrized queries, and there's no excuse not to use them. This is no excuse to not use SQL. It's like saying that no one should use ORMs because you can use them poorly.

> And surely you have also seen plenty of equivalent examples of bad code written using SQL?

I've seen bad Java, C, C++, Python, Perl, bash, etc... The SQL I work with is no worse than the Java, C, ... that I work with.

> Cal is attacking an imaginary ActiveRecord pattern, while ignoring the features of the most famous implementation (RoR), and ignoring the benefits of ActiveRecord. Dumb.

He was talking about a particular pattern, not Ruby on Rails.

shabda · on March 19, 2020

Django ORM is ActiveRecord pattern, right?

This is what I would use

   Book.objects.filter(author__nationality="Italian", author__state="deceased").values("isbn")

This is

- Is one single DB query.

- Completely obvious way to describe what I want to do.

- Only pulls the fields which are needed from the DB.

If you are claiming "the troublesome acriverecord pattern" you need to provide specifics of what can't be done, rather than claim "Don't look too deeply into the code samples for specifics though"

smoe · on March 19, 2020

I think as soon you use "values" it is no longer the ActiveRecord pattern since the result is not an object instance representing a single row in the table.

I never read a proper specification of ActiveRecord, but it could be that a "pure" implementation doesn't allow for joins? Hence the weird looking example with the nested for loop, which variations of I have seen plenty in real world code. Mostly by people that don't sufficiently understand the ORM or SQL, not because they were sticking to a pattern though.

In your example it is obvious what happens in this simple example, but as soon the joins become more complex, it becomes in my opinion quite unwieldy and non obvious what the resulting SQL will be. I had multiple performance problems in the past because of the queries the ORM generated or with memory/cpu usage on the app side.

LargeWu · on March 19, 2020

That's fine. For 95% of the queries you need, ActiveRecord-ish implementations are perfectly performant, and super readable. For the rest, they allow you to drop down to the SQL layer and specify exactly what you want or need.

smoe · on March 19, 2020

Yes, that is what we do and what I like about SQLAlchemy for Python. We use the ORM for like 90% of cases and drop down to its SQL abstraction underneath the ORM.

The last time I used Django a couple of years ago that was one of the bigger pains for me: It was either the Django ORM capabilities or raw SQL, not much in between. Or maybe I was just unaware of how to do it better.

kd5bjo · on March 19, 2020

> Don't look too deeply into the code samples for specifics though, they are a pastiche

Your article here seems to avoid falling into this particular trap, but I consider this a problematic view for authors to hold. In software writing, code listings have equivalent weight as figures and tables, and are often what readers will look at first to get an overview of the article. If they don't adequately demonstrate the argument of the text or don't appear to be plausible, the entire article will fall flat.

ptr · on March 19, 2020

An article dedicated to why Rail's Active Record library isn't actually using the Active Record pattern would be interesting!

zrail · on March 19, 2020

Rails' ActiveRecord is both a query builder and also the canonical Active Record pattern. For example, one can say

  book = Books.joins(:authors).where(author_name: 'James S. A. Corey').first
  book.title = "Tiamat's Wrath"
  book.save!

It's not that they stopped supporting the Active Record pattern, it's that they rebuilt the whole thing a few years ago with a real relational query builder under the hood.

mnutt · on March 19, 2020

Not to detract from your point, but rails 3 was released with arel in 2010, so it has used a relational query builder for a decade at this point.

zrail · on March 19, 2020

Thanks, I wasn't super clear on the timeline there.

inopinatus · on March 19, 2020

It’s more like the repository pattern under the bonnet.

calpaterson · on March 19, 2020

Yes I am curious about that. Someone else said that the Rails authors used to like and defend this pattern and that presumably they changed their mind. Maybe because they ran into problems? I'm not at all close to the Ruby community but would love to know.

https://lobste.rs/s/valayp/troublesome_active_record_pattern...

dhagz · on March 19, 2020

That's my general impression. As Rails grew in popularity and was used for more and more complex things (with more and more complex database tables), they started to add in the raw query building parts of ActiveRecord. Though I think it was more that they wanted raw query functionality in the library and it made most sense to enhance the tool that already dealt with the database rather than add another alongside it.

Glyptodon · on March 19, 2020

I think most ORMs that in some sense follow permutations of the active record pattern in wide use are well capable of doing joins and only selecting needed/specified columns.

I agree there are places the pattern falls short at times, but are there actually any implementations of the pattern in widespread use that would make you use separate queries in loops rather than do a basic join?

tomlock · on March 19, 2020

But in Django you can do the same thing that doesn't return all rows with `select_related` so it seems kind of like you're not using the features of whatever ORM that is.

mtberatwork · on March 19, 2020

I was confused a bit as well by the examples since Django's ORM has functionality to do exactly what the author is trying to do (e.g. select_related, prefetch_related, <model>_set, values, managers, F expressions, transactions, etc).

barrkel · on March 19, 2020

The way I see it, Active Record is an explicit acknowledgement that the database and its relations are important. AR modelling maps almost directly onto the database relations, rather than what might be more natural in OO.

It's actually harder to go straight to the database if you're using a more powerful ORM, because with a more powerful ORM, you're hiding implementation details. You tend to obfuscate the difference between a calculated and a stored column, a denormalized vs normalized attribute, composite vs entity, etc. That makes it harder to write SQL to get what you want, not easier, and when you do write the SQL, you're slicing through a thicker business layer and are at more risk of violating invariants.

If you've got a monolingual, homogeneous tech stack, the whole database might be abstracted away and you might use an OO interface which uses distributed caches and locks etc. You're much more likely then to write your queries as loops and manual filters and joins.

erik_seaberg · on March 19, 2020

I have yet to see a company more than a couple years old that remained on a monolingual, homogeneous tech stack. If your data is at all accessible to eng or PM or QA, there are extra consumers that will surprise you if you break them.

buboard · on March 19, 2020

So if you re going to have to "learn" active record, why not learn SQL instead?

hrktb · on March 19, 2020

From the article, about why mitigations of these issues is not viable:

> Likely they will require increased understanding of the specific ORM being used

So your diagnostic of "deliberately avoid reading the documentation" is spot on.

If the goal of the author is to not read the docs, I completely agree that most ORMs are an absolute hell to use and going back to almost pure SQL would be the holy grail.

But then, I am personnally in the camp of spending the time to understand the tools. There is a tradeoff between the time spent to on learning a specific tool and the ROI of that learning curve, I think this article's position on wanting mostly returns with no time invested is way too unrealistic.

PaulHoule · on March 19, 2020

He also hates forking in Git.

cfv · on March 19, 2020

I want to like this article because I agree with the overarching general statement that the ActiveRecord pattern is troublesome.

But it's... not very good. He constructed his own interface in order to complain about it, an interface which is so subptimal that it lets him complain about how bad it is as if the pattern was somehow to blame.

In most Active Record implementations I've seen, a Record is a live tuple ("Active" "Record") and you typically have some form of Repository ("Books" maybe in this example?) alongside of it which is responsible for carefully and optimally building the record objects.

The pattern itself says nothing about how to query things, it talks about a kind of DAO that has convenience methods to access and manipulate a tuple.

Also, what's with the REST rant? I don't think I understand how that's linked.

topspin · on March 19, 2020

> He constructed his own interface in order to complain about it

Yes, it is contrived, but it faithfully represents many examples I've seen in the real world.

> Also, what's with the REST rant? I don't think I understand how that's linked.

Why not? It's clearly explained. The typical REST API (as opposed to the well designed REST API) is just Active Record over HTTP: /something/:id called in N+1 loops. And this emerges for the same reason as it does in the Active Record pattern; the 'central abstraction' offered by REST is mapping every resource to a URL, as opposed to mapping every row to an object.

cryptonector · on March 19, 2020

A REST API on top of an ORM is going to have the same problems the ORM has. But you can have a RESTful interface to a relational database. For example, PostgREST is awesome.

jmull · on March 19, 2020

I think this is pretty confused.

The active record pattern from the article isn’t the active record pattern from Fowler and not really how it’s generally practiced. Fowler’s active record is about having data access methods on the data objects. The article says whole object access and access by id, so something different. Probably would have been a lot better not to reuse the term.

Whole-object access is very typically part of ORMs. The article raises an issue this way: “Retrieving whole rows is hugely wasteful when only part of the row is required to resolve a user request.”

But note that you can control what’s included in the object. In other words, if you are running into problems due to getting too many columns because your object has too many properties... then change your object. An ORM giving you what you asked for isn’t a problem with the ORM. I think when people run into this kind of thing, I think they are usually stuck in a situation where they can’t express queries in a reasonable way. (Could be due to complexities or limitations of a specific ORM, or could be silos between front end and back end development.)

Getting objects by id is very typically part of ORMs as well. But ORMs very typically include a mechanism for queries as well, so I don’t even know why that came up. Maybe again the problem the author is trying to get at is lack of reasonable flexibility to express queries.

Now, a general issue with ORMs is that they tend to bury queries under an extra layer of abstraction. I think there’s an article to be written about that.

The article does go on to say a data access mechanism should have first class queries and first class transactions. Which are completely true, but dont really have anything to do with active record. (Those are just a start, though. Some think an integrated query language is “first class“, which is backwards.)

The part about restful apis is probably not worth breaking down. it conflates restful, the idea, rest apis as commonly implemented, http, json, and so forth.

bradly · on March 19, 2020

Also, if you need or may need multiple attributes of an object, fetching the entire object by id may save time as the id-based query can be cached and save you from making multiple trips to the db.

davedx · on March 19, 2020

I work a lot with EntityFramework (the "unit of work" pattern) and it does solve the issues the author brings up fairly well: queries are first class (LINQ with static typing), and transactions are on by default between each .Save operation.

However it's not a magic bullet by any means: EF will still sometimes generate weird join queries that require serious head scratching and second guessing to get into a better form, and huge nested .Include / .ThenInclude trees are extremely common when your data is highly normalized. This can lead to the same kind of inefficiencies seen with the ActiveRecord pattern.

FWIW what really slows things down isn't typically selecting every column, it's those really giant joins or joins on things that aren't indexed. I think there's some room for improvement in the DX of EF itself here -- it could at least tell you if you write a join on a non-indexed column. (It has all the information there, as the indices are defined in EntityMap source files).

Overall, when building applications on a RDBMS <-> OOP <-> REST API, there are indeed different fundamental abstractions, and no framework or library will save you from needing to understand how those abstractions map between each other.

And believe me, GraphQL is also no panacea! It has its own similar but different issues.

pmontra · on March 19, 2020

The problem with this code

  book = Book.get(isbn="978-0099466031")
  book.copies_in_stock -= 1
  book.save()  # WRONG! irreparable data loss here

is that in SQL you should write it like this:

  update books set 
    copies_in_stock = copies_in_stock - 1
  where isbn = '978-0099466031';

No needs of transactions.

I don't remember how to do it with Django's ORM but in Rail's ActiveRecord I'd use directly the connection object

  update_query = sanitize_sql(['update books set copies_in_stock = copies_in_stock - 1 where isbn = ?', isbn])
  ActiveRecord::Base.connection.execute(update_query)

This saves a lot of programming pain, possibly skip some unnecessary bugs and get better performances.

sanitize_sql is from https://api.rubyonrails.org/classes/ActiveRecord/Sanitizatio...

calpaterson · on March 19, 2020

(Author here,) I know it seems like a synthetic example but this kind of code is quite common, particularly from devs who don't know databases well. I suppose editing a json field or something would seem more plausible and I should have used that example (though there are also ways to do that in Postgres I think).

falcor84 · on March 19, 2020

>... particularly from devs who don't know databases well

I think that a lot of your argument just boils down to this. The Active Record pattern allows less experienced/knowledgeable developers to write CRUD code that works. These are the same devs which would often create unnecessary quadratic (or even worse) solutions in regular imperative code too. Their code would often be far from pretty or optimized, but using modern high level languages and libraries, they can relatively easily get the job done and provide business value.

If given a dichotomous choice between libraries that are make it easier to write and to reason about out of the box, or libraries that perform much better but only "when you know what you're doing", most would choose the former. And I think they'd be right, there's enough complexity out there in the actual domain already.

But it's actually even better, it's not actually a dichotomous choice, and it's generally quite straightforward to bring in a more experienced dev to help refactor and optimize such code later, when the need arises.

pmontra · on March 19, 2020

I know. Examples can't always be close to the real world and there are really people writing that kind of code. I also think you can update a json field in a single statement in postgresql but I didn't check how. The syntax did working on jsons can be quite unwieldy.

oxalorg · on March 19, 2020

True. It's even easier in Django using F Expressions

    book = Book.objects.get(isbn="978-0099466031")
    book.copies_in_stock = F('copies_in_stock ') -1
    book.save()

F expressions perform operations at the DB level directly using column name without ever pulling that data into Python code.

JoshuaDavid · on March 19, 2020

The transaction-free approach still has potential issues though. Consider:

    select isbn, copies_in_stock
    from books
    where isbn = "12345678"
      and copies_in_stock >= 1;
    isbn:            12345678
    copies_in_stock: 1

    update books
    set copies_in_stock = copies_in_stock - 1
    where isbn = "12345678"

If you have two of the above requests happen at the same time such that the execution order is SELECT SELECT UPDATE UPDATE, you'll end up with -1 copies_in_stock. You can try adding an "and copies_in_stock >= 1" to the UPDATE and then look at the number of affected rows, and perform recovery logic if the number of affected rows is 0, but at that point you've basically implemented a transaction, probably badly if it isn't what you set out to do.

Sacho · on March 19, 2020

I don't see how a transaction saves you here(without locking the row), can you elaborate? In EF projects, this problem is solved through optimistic concurrency, similar to the fashion you described - a column that changes on each update is used as a canary to confirm that you're applying changes to the record you expect.

https://docs.microsoft.com/en-us/aspnet/mvc/overview/getting...

This also has the added benefit of allowing you to issue only 1 query(just the update), instead of select + update.

pmontra · on March 19, 2020

I understand that this is an example so I try to build some context around those queries.

The issue here is that you (apparently) want to guarantee that whoever sees the result of the select (probably a customer) can safely buy the last available book with the update.

This can't be solved in a simple way because minutes can pass between that select and that update. This means that you can't put those queries in a transaction. If you could you should use select for update

https://www.postgresql.org/docs/current/sql-select.html

I used code like update table set counter = counter + 1 / - 1 many times without transactions when all I have to do is to update a counter and nothing else in the system really cares about the value of the counter. If this is not the case, the queries accessing the counter should use some form of locking. A transaction is not enough.

This is the first good guide I found googling for examples https://vladmihalcea.com/a-beginners-guide-to-database-locki...

ivanhoe · on March 19, 2020

Don't mix transactions and locks. Transaction just ensures that all queries within it were run successfully and atomically. Locks are used to prevent race conditions. And in your particular example neither a transaction, nor lock would help, you need to check explicitly in UPDATE if you've got any books left.

balfirevic · on March 20, 2020

Race conditions absolutely are among the set of things transactions are supposed to help with and locks are one of the several ways they do it. It's the I from ACID, which stands for Isolation.

Transactions are not omnipotent in that regard and they fail in several well-researched ways, but it is in their purview.

strken · on March 19, 2020

Does ActiveRecord (the Ruby library) support e.g.

    Book.where({isbn: "978-0099466031"}).update_all("copies_in_stock = copies_in_stock - 1")

without breaking? It's been a long time since I've used it and I can't remember whether you have to use .execute or not.

petepete · on March 19, 2020

Just tried it and it works fine. Feels a bit un-Railslike but it does exactly what you'd expect it to

    > Unit.where(ccp_id: 1).update_all("position = position + 1")
    Unit Update All (9.4ms)  UPDATE "units" SET position = position + 1 WHERE "units"."ccp_id" = $1
    => 5

pmontra · on March 19, 2020

Probably yes. Check the comments at https://apidock.com/rails/v6.0.0/ActiveRecord/Relation/updat...

thom · on March 19, 2020

While I'm not claiming the first snippet is the correct way to structure this sort of thing, all of these libraries have supported optimistic locking in some form for many years.

ZiiS · on March 19, 2020

In SQLalchemy you can

  book = Book.get(isbn="978-0099466031")
  book.copies_in_stock = Book.copies_in_stock + 1
  book.save()

GrumpyNl · on March 19, 2020

Its also way faster and less complicated. One query line, one action on the db.

raziel2p · on March 19, 2020

The only difference between active record and non-active record is that with the former, there's some globally accessible database client session object that the record objects know how to get to.

There's no difference between `Author.get(id)` and `session.query(Author).get(id)` in terms of code design, data integrity or whatever. Maybe you could make the argument that by forcing you to interact with the `session` object you are encouraged to call `session.save()` which makes the transaction explicit, but I find that a thin argument.

People are going to make O(n^2) mistakes like the one used as an example here regardless of whether they use an ORM. In fact, people make those sort of mistakes without even touching the database.

Yes, the ability to expose query objects is super important, but there are active record libraries that do this, for example Laravel's Eloquent or Django's ORM.

Seems like this article is conflating a lot of issues that aren't really related. I would re-write the whole thing as:

1. The importance of native query objects in your ORM 2. The importance of transactions 3. Traps of O(n^2) when using ORMs

Don't even bring up active record.

lmm · on March 19, 2020

> There's no difference between `Author.get(id)` and `session.query(Author).get(id)` in terms of code design, data integrity or whatever. Maybe you could make the argument that by forcing you to interact with the `session` object you are encouraged to call `session.save()` which makes the transaction explicit, but I find that a thin argument.

What's thin about it? The most important aspect of code is being able to read it and understand what's going wrong; a seemingly innocuous function call that actually interacts in a complex way with some hidden global state is pretty much the worst kind of code you can write.

gt565k · on March 19, 2020

You can't misuse an ORM pattern and then call it bad for performance reasons.

Same thing is true for SQL.

I'm currently working on migrating a 10 year old asp dot net c# system of applications. The amount of nasty stored procedures being called from other stored procedures that call on multiple views and total disregard for data modeling best practices is atrocious. I've had to just throw away shit code and get the business requirements in order to completely redo the SQL queries because they were impossible to read and fix.

We had a single query with over 2000 execution plans due to parameter explosions! It confuses the shit out of the database query planner and caused intermittent performance issues based on who ran the query with what parameters and when they did it.

His code example with m*n run-time is a total misunderstanding of how to use an ORM and is common among people who are used to thinking of databases as dummy stores. In reality, a database run-time can provide great performance when used correctly. Data partitioning, query optimization, materialized views, are all great skills to have in your toolbox. Most developers don't know jack about databases, data-modeling, and performance optimizations.

Active Record is fantastic for CRUD operations, it removes boilerplate code, and still provides good ways to join relational objects. Always select only the fields you need in ORM queries to reduce overhead and the data retrieved. Tune your database (rebuild indices, and partition your data properly). There's a general disregard for understanding technology and hacking of business logic in the service layer which results in poor performing applications. Perhaps that's what makes people hate on Rails and Active Record. It makes it easy to write bad, non-performant code. What works today on your small dataset, will not work 6 months or 5 years from now.

A tool is only as good as the person using it.

Don't be a fanboy hating on specific technologies, when you don't know when to use what for a particular use-case.

ben7799 · on March 19, 2020

I have seen nothing but problems with the whole Active Record/ORM pattern in the last 15 years.. mostly java enterprise software. The problems existed before but IME they've gotten a lot worse. I'm pretty much in total agreement with the author, although he doesn't touch on all the points.

The "pattern" (pun intended) always goes like this:

- Gotta get 1.0 out really fast

- Agile Dev really coming on strong last 15 years

- Architect/early dev lead is not a DB guy

- ORM + Active Record looks attractive and fast due to less code

- Junior devs don't even have to learn SQL

- Simple Active Record architectures let you layer on new stuff to get the all important 2-week sprint demo. Since Agile overvalues the sprint demo & showing stuff in the UI is what matters to PMs performance doesn't matter and gets ignored here. Infrastructure work for future scaling doesn't count as points and/or is considered poor use of time.

- Tons of entities & repositories based around Active record start getting built, it saves tons and tons of code over hand written/optimized SQL code

- Project launches and starts getting customers, getting bigger, needs to scale

- Active Record & Naive ORM generated queries start breaking down due to locking too many rows in DB

- Database deadlocks eventually start happening

- DB Connection pools eventually start running out of connections

- Customers are furious

- New Upper management comes in and wants to rein in PM to try and get developers to work on the massive tech debt. Agile starts to not even work since tech debt makes it look like no work is happening since it doesn't demo.

- Developers start having to piecemeal replace some ORM generated queries with native queries/query builder type patterns

- Eventually the team starts to have to rip the ORM out completely to continue scaling.

The alternate pattern that happens is that the team gets hot under the collar to drop the SQL database for a column-store/NoSql database. The problem wasn't the SQL database here but bad usage of patterns that the NoSQL DB probably doesn't support if it is not an ACID DB.

My feeling is most of these ORM based/Active Record type systems are often 3-4 orders of magnitude slower. Fixing some of this is somewhat one of my specialties.. I have rewritten many stacks that were taking minutes to finish and had them working in < 1 second once the ORM was taken out of the picture.

dpc_pw · on March 19, 2020

This comment is spot on. I've seen similar scenarios myself. :)

99% of articles like the above one come down to: "developers wanted to pretend there isn't some underlying data model underneath, and abstracted it away".

The problem isn't that much in ORMs / query builds etc. It's with "Objects" itself, and the idea that one can abstract away the data model and pretend the business code can be written as a bunch of OOP classes and such. It seems like a genius move at the beginning but hits a really steep cliff once the load and complexity of problems incrementally increases. When it starts to happen it's actually funny how not many people can actually realize what happens, and they usually struggle with band-aid solutions and blaming irrelevant pieces.

I decided to post on twitter about it: https://twitter.com/dpc_pw/status/1240719977071575040 in case anyone is interested.

floatboth · on March 19, 2020

Don't pretty much all "active record" ORMs now have query builders and encourage their use? The pure pattern seems pretty dead to me..

tirumaraiselvan · on March 19, 2020

> One new alternative is GraphQL which is a query language and server that is able to combine REST endpoints into a single result - making the query a first class object.

I think it would be more correct to just say that GraphQL encourages queries as first class object as it exposes the entire schema and allows for granular querying.

Implementing it by composing REST APIs is, well, an implementation detail. There are other ways to do it too.

sidewndr46 · on March 19, 2020

I wrote about this years ago on the topic of Golang specifically:

http://www.hydrogen18.com/blog/golang-orms-and-why-im-still-...

The phrase "object-relational impedance mismatch" resonates pretty strongly with me still. That being said, I still use lots of ORMs in my day job.

johncs · on March 19, 2020

> This pattern of access is simple and intuitive but causes huge problems in even small projects. Retrieving whole rows is hugely wasteful when only part of the row is required to resolve a user request.

Retrieving a whole row rather than just a single column from a database that's likely on a local network definitely doesn't seem "hugely wasteful", especially for smaller projects.

calpaterson · on March 19, 2020

Author here - I'm afraid it is hugely wasteful. The network is really nothing to do with it but instead the fact that you have to make a table read instead of an index read. This seems like a small thing but as I describe it changes the complexity class of the algorithm you have effectively written.

balfirevic · on March 20, 2020

How does it change the complexity class of an algorithm?

Database performing a table read because additional columns were requested shouldn't change the complexity class if the query was otherwise successfully satisfied using the index.

I searched the article for "complexity class" and, while I found the phrase, it didn't explain much.

calpaterson · on March 20, 2020

Sorry, yes, you're right. It doesn't change the complexity class just because you have to go to the heap. It is however a lot slower for the usual reasons: the heap is less dense in memory, it's a second thing you have to do, you have to serialise/deserialise more, etc etc.

But yes, you're right: just going to the heap is not a change in complexity class.

TeMPOraL · on March 19, 2020

It is when you're retrieving thousands of rows/objects.

jchw · on March 19, 2020

As a knee-jerk response I was going to point out that all of these problems were already solved well by complementary patterns available in libraries like SQLAlchemy, but after reading it is clear that what the author is referring to as “Active Record” is different than what you are probably thinking. That said, I don’t know any ORMs that don’t have query building APIs right now, so I’m not sure what this is aimed at; its clearly possible to combine object based ORM interfaces very nicely with query builders. The relationship is beneficial imo. I think this will become more apparent in languages like Rust where you could potentially use metaprogramming to provide a bunch of type safe sugar revolving around your model classes in the query builder (I haven’t tried SQL in Rust yet - this may already exist for all I know.) It is already useful in Python where it can aid in migrations, scaffolding for tests and new environments, and of course, the pretty common case where you are really dealing with entire objects, like CRUD.

steveklabnik · on March 19, 2020

It does!

http://diesel.rs/

(Created by one of the former maintainers of ActiveRecord in Rails. It's much more of a query builder than an AR implementation.)

FeepingCreature · on March 19, 2020

If you can separate call sites somehow, can't you just lazily fetch attributes as required and also remember which attributes were needed for next time? Ie. teach the ActiveRecord library to "learn" its usecase automatically, while appearing trivial to the user.

A similar pattern could be done for on-demand joins, though you might need macro magic (or a dynamic language that lets you, say, subtype 'string') to derive joins from attribute values. For instance, if you have a sufficiently advanced typesystem, you may return instead of a string something like a `SqlSourcedType(string, "tablename", "attributename", originalCallSiteId)` that acts interchangeable with a string. Then later on when passing such a type to a load on another table, the framework would know that you were using a join idiom, and could load the joined tables directly in the first query the next time around. Basically JIT SQL.

tlarkworthy · on March 19, 2020

1. ActiveRecord is extremely troublesome but not because of bandwidth. Sure, data movement is a performance headwind, but not the end of the world (in a row store all the data is adjacent anyway). Active record problematic because of its non-immutable nature. You cannot copy and mutate rows without weird side effects (in Spring for sure), which leavers you with an inherently imperative interface to your DB.

2. REST does not insist you fetch whole rows. You can have field masks with REST, it's inherently untyped, there is literally nothing preventing you adding that performance optimization if you need it, in a backwards compatible way (add the field mask query param later when you need it). https://developers.google.com/slides/how-tos/field-masks

kingraoul3 · on March 19, 2020

This is described as The Object-relational Impedance Mismatch:

https://en.wikipedia.org/wiki/Object-relational_impedance_mi...

Thanks you Cal for the insightful and concise article!

mjevans · on March 19, 2020

I think I'd like to just write a native SQL query (past basic tables and joins there's always some platform specific quirk or optimization) and have a better way of mapping that to generic 'interfaces' (maps/dictionaries/whatever) or typed objects with members of specific types, and have the data correctly coerced to those types.

The biggest pain point nearly always also seems to involve time conversions that might or might not use default assumptions and environment variables which language examples almost never cover. Toss in specifying an encoding to handle the result as (or to disable) and I think that's what might make up a good DB interface specification for drivers to implement.

cryptonector · on March 19, 2020

Yes, ORM patterns generally lead to great inefficiencies, such as client-side implementation of JOINs and such.

That doesn't mean that you can't have a RESTful API for a relational database that doesn't have these problems. For example, PostgREST does exactly that.

cryptica · on March 19, 2020

I still don't understand what problem ORMs solve and I've used many different ORM solutions for over a decade. SQL really is the best and simplest interface to get data out of a relational database. Some people will argue that SQL is too complicated for simple queries like CREATE, READ, UPDATE, DELETE but I would disagree because these simple operations require simple queries. Reading an element by ID is the easiest operation of all: `SELECT * from tablename WHERE id=idnumber`. It's not difficult at all. Abstracting this away inside complex objects doesn't solve any real problems, it just creates more problems.

Sometimes developers will argue that they end up reusing variants of the same query in too many places... That is a poor argument; if that's the case, it means that your code is poorly designed. If you organize your code in such a way that components have clear and distinct responsibilities, you're not going to end up with duplicated queries.

Also, the perceived 'impedence mismatch' between OOP languages and relational databases is only apparent when you try to design a data layer in OOP code which exposes a general purpose interface to execute specific queries. Impedence mismatch can be solved simply by dealing with each database operation on a case-by-case basis using raw SQL or Language Integrated Queries (E.g. LINQ). It adds a little bit of boilerplate, but it's absolutely worth it. You don't need general purpose entities just to access data.

ptr · on March 19, 2020

It's a very interesting topic but it gets weird when large words ("completely unworkable", "causes huge problems in even small projects", "retrieving whole rows is hugely wasteful", "unfortunately Active Record access patterns break down almost immediately") are used to describe something that doesn't match one's experience at all. I wish we could focus on the Active Record concept itself rather than implementation details or specifics of projects where the pattern has been applied.

The article talks about projection, joins and aggregations, but many projects in Rails that I've seen just don't need those: it's often solved in the application. Yes, it's SUPER SLOW, it loads data just to throw it away again and can break ACID properties, but it generally works -- these applications aren't handling transaction on a stock exchange or controlling a pacemaker. And in the case of Rails' ORM, it's been very easy to do real joins, projections and aggregations when needed, meaning that the Active Record pattern itself hasn't excluded that possibility.

The paragraph about transactions is also a bit weird as it has nothing to do about the Active Record pattern afaics.

calpaterson · on March 19, 2020

Apologies for the word largesse - I should probably have edited out the adjectives (which always cause me a problem) more aggressively.

I actually find that even though these applications aren't running pacemakers that they still don't work. It's really easy to write what is in fact bad algorithm for accessing (eg) shoe size data when you are working on rows only as objects.

I cannot speak (and am not) for the Rails ORM but the problem here is less about some library people use but about the primary affordances being "get this object by id, edit it, and save it back". This is just not right IMO and is the underlying cause of the famous "object-relational mismatch". I suppose I come down on the relational side of that in that I think object access is the problem.

hinkley · on March 19, 2020

I really wish there were one right answer, but it doesn’t seem like there is one.

The last time I tried to write code with just raw SQL, the query patterns got tedious really quickly, and I found myself using the most bizarre patterns to normalize query fragments and avoid typos in rarely used queries. It all made sense to me as I did it but others thought I was nuts. They were probably not wrong.

But every time I used an ORM or worked with people who did, things got sloppy, quickly.

colonwqbang · on March 19, 2020

It's a pretty radical alternative but I've been playing a lot with the postgresql-typed library. It actually connects to the database during compile time, validates the queries and calculates appropriate type constraints for your application code.

Simple typos are caught of course, but also more subtle problems. If for instance you drop a NOT NULL constraint on a column in your database, your application will not compile until you add in the appropriate NULL handling.

Like I said it is a bit radical, but to me it clearly feels like the future and I miss it whenver I work with a traditional DB mapping.

gmac · on March 19, 2020

I agree that too much abstraction from your SQL database is a world of frustration. I feel that ‘query builders’ themselves can end up feeling like too much abstraction, albeit of a slightly different kind. I’ve written about an alternative approach for TypeScript [1] and am in the process of turning it into an npm module (complete with nested queries that produce LATERAL joins, with all results fully typed).

[1] https://github.com/jawj/mostly-ormless/blob/lateral/README.m... (note: I’ve updated the code and write-up here, but not yet the instructions for setting it up, so YMMV on that)

jasonkester · on March 19, 2020

What a strange article. It starts off nice enough, describing a handy pattern that lots of us use to handle individual records for CRUD.

But then it constructs this giant crazy strawman of ways that you could do a bunch as ridiculous silly things, and how if you did those crazy things it would completely wreck your performance.

Certainly it would. But nobody would ever do any of those crazy things.

My version of this thing, for instance, makes a point of wrapping all your Stored Procedures for you, and provides methods to construct single items or lists of them from a dataset that you hand it. That solves roughly 100% of the panicky disaster scenario that the author predicts. (The other 0% is the giant (also usually 0%) performance hit you get from pulling all six columns of data from a table when you only need two of them.)

So yeah, don't recursively enumerate the entire database three times when you can write a query with a Join in it. But don't make up silly query languages that aren't SQL either.

Just do what sensible people do with this data pattern and you'll find that it works nicely.

tln · on March 19, 2020

I wish Django's ORM had a query builder that was closer to sql. Aggregations are a PITA and HAVING clauses require small mammal sacrifices.

Not sure the blame belongs to the ORM but Django rest framework has been the source of N+1 queries as well

febeling · on March 19, 2020

I observe how this post creates the wish in me to say hurtful things to the author.

Active Record is a powerful pattern - for it's usecase. You use it to build applications to find, create, update and delete objects. It's not useful for complex queries, aggregations, graph applications, NLP, mathematical analysis, or any other use case it's not meant to solve. No surprise there.

The ruby package of the same name includes quite a few things that support complex queries and aggregation, and that way it goes beyond the pattern which it is named for (including all of the examples that are presented as the better alternative solution). For requirements that are best solved by dropping to SQL you can bypass the records machinery entirely, in marked difference to some other ORMs which try to completely hide and monopolise DB access.

ZiiS · on March 19, 2020

The problem with equating SQL and GraphQL is you are not expected to write the RDBMS. One day it may be as easy as creating a DB schema; for now I will focus on developing RPC APIs.

praveenweb · on March 19, 2020

Have you checked out https://hasura.io? That may be exactly what you are looking for. You create DB schema (Postgres in Hasura's case) and point Hasura to Postgres to get an instant realtime GraphQL CRUD.

Disclaimer: I work at Hasura.

mmedal · on March 19, 2020

Perhaps “dead Italians” isn’t the greatest example data to be using at this moment.

pmontra · on March 19, 2020

I'm Italian, living in Italy. I didn't feel anything bad going on with the example. Actually I was happy to see an example about Italy. It's a rarity.

skrebbel · on March 19, 2020

[flagged]

tehlike · on March 19, 2020

It refers to covid-19

skrebbel · on March 19, 2020

I know. The blog post examples are about Italian writers, not about corona victims. It's a homage if anything. Stop being insulted on behalf of other people, there's no harm being done here.

jakelazaroff · on March 19, 2020

> Stop being insulted on behalf of other people, there's no harm being done here.

Take you own advice and let the author respond, then, rather than being insulted on his behalf.

skrebbel · on March 19, 2020

I'm not insulted on anyone's behalf. You're trying to make perfectly harmless writing taboo. My own perfectly harmless writing will be next.

The article example is about Umberto Eco. He's dead. Corona didn't kill him. Get over it.

jakelazaroff · on March 19, 2020

OP is just politely asking the author to be considerate of the current situation in Italy. There’s no need to be overly sensitive.

skrebbel · on March 19, 2020

Are you trolling me?

jakelazaroff · on March 19, 2020

I’m saying that OP’s comment was reasonable and polite, yet you seem to be taking it personally for some reason.

The issue they brought up has to do with an entirely superficial part of the article. This is a vanishingly small hill to die on.

jbjohns · on March 19, 2020

Then why are you dying on it?

jakelazaroff · on March 19, 2020

I'm not; I'm dying on the hill of "someone made a respectful comment to the author so let's not act like they've done something horribly wrong".

fraktl · on March 19, 2020

I stopped reading at the title "Problems arise".

Example contains a piece of code that uses nested loops and each loop queries the db - producing many queries.

There's no problem with ActiveRecord. There's a big problem with programmers not learning relational databases.

Here's an excerpt from the post: "To get all ISBNs of all deceased Italian authors in the Active Record pattern"

Now look at this black magic: you create an SQL view with all the joins you need in order to cross-reference data and extract what you want, then you have your ORM use your view as if it were a table. Job done. Your code is now 1 liner and the database won't receive too many queries.

I know that the text references SQL views but this is what's said about them: "SQL has views too of course, but most ORMs don't have any support for them"

A view is, for all intents and purposes, a table to the ORM. There's no special support needed for it, if you use the view to read from it.

I'm sorry for being one of those guys who shit on someone's hard work, but the mister Cal Peterson missed the whole football field here.