With all due respect, Cal Paterson doesn't seem to know what he's talking about....

calpaterson · on March 19, 2020

Hi, Cal Paterson himself here. It's not a Ruby example, it's a Python example. Don't look too deeply into the code samples for specifics though, they are a pastiche.

You are not the only person to get the wrong end of the stick re:"Active Record". I'm talking about the Active Record the design pattern, not about the Ruby library that took the pattern as it's name but now is apparently a querybuilder library. I probably need to go back and make it clearer somehow.

EDIT: I've done this now. Back to stockpiling tinned food and bullets!

sanderjd · on March 19, 2020

The reason there is confusion is that you're debating a straw man. The reason none of the popular "Active Record" libraries (like Ruby's ActiveRecord) actually have the problems you're identifying is that none of them implement a pure version of the design pattern, because doing so leads to the problems you're identifying. So you're both right that the pattern doesn't really work, and wrong that it is a major problem in practice.

dkarl · on March 19, 2020

In his defense, I'll say that at many of the places I've worked, there have been vocal programmers who regarded writing SQL as a dumb graybeard party trick with the same relevance to modern software engineering as hand-crafting a binary. These are the people who, when their ORM generates an inefficient query, trumpet it as proof that the database was designed wrong, probably by people who wish their SQL skills weren't so hopelessly obsolete in the modern age.

I see this much less now than I did 10-20 years ago, so perhaps it has gone out of style, or perhaps it is a disease of large companies (which I haven't worked at in a decade.)

rapind · on March 19, 2020

I think if there were better SQL tooling you'd see more people embrace it. It boggles me that given the age of SQL it hasn't gotten better tooling.

For example I'd love to move more application logic into postgresql functions, but it's such a pita to run and test I end up procrastinating.

I've been playing with postgREST for a while now and have yet to implement it outside of toy projects. Not because of postgREST itself (which is really cool) but because writing business logic in postgresql is so painful.

Annoying because most CRUD apps could live almost completely in the database exposing views and functions via REST to the end user with w/e flavor of the month UI you like. Oh and they'd run orders of magnitude faster.

I'm not even complaining about SQL syntax, which IMO is actually pretty clean and functional. Just the lack of decent tooling to get shit done.

calpaterson · on March 19, 2020

I'm afraid it is a major problem in practice - I'm didn't write the post in the abstract I wrote it because this problem comes up over and over. If you've never seen it then lucky you! :)

Personally the conclusion I'm coming to is any interface that presents a table as an class is just the wrong affordance and likely to lead to problems down the line. If programmers modeled tables as sets instead of objects we wouldn't have all these problem apps that page the database into program memory while serving a web request.

aurisor · on March 19, 2020

It is a major problem in practice. It's called the "N+1" problem and it's exhaustively studied and documented, with mitigation strategies for every ORM that encounters it.

This is a bit like acting as if you're the first person to realize that null pointer deferences are a problem and happily telling the entire world to "just switch to pass by value."

calpaterson · on March 19, 2020

You aren't right about that - the N+1 problem is just one specific issue WRT joins and while the example I give does have it that is not the only thing wrong with it. The rot is more general sadly.

aurisor · on March 19, 2020

> ISBNs of all deceased Italian authors

  Book.joins(:author).where(author: { deceased: true }).map &:isbn`

> first class transactions / decrement race condition

  Book.find(isbn: 1).decrement :copies_in_stock

> Row-locking

  Book.transaction do
    b = Book.lock("FOR UPDATE NOWAIT").find_(1)
    # logic around handling out of stock
    b.copies_in_stock -= 1
    b.save!
  end

Those are all of your examples and the Ruby code that implements the solution you want. (Disclaimer: there are probably small syntax errors here, you get the gist)

Honestly this article sounds like it was written by someone who read a 10 year old Martin Fowler article and decided to critique it without even a cursory understanding of modern ORMs. Even more strangely, when everyone starts pointing out the obvious fact that ORMs do solve these problems in practice, you've decided to move the goalposts to say "oh, well the straw man abstract pattern doesn't solve for these real-world problems."

In the future I'd recommend you familiarize yourself with a problem space before trying to publish on it.

sanderjd · on March 19, 2020

I think we are using "problem" in two different ways that are each reasonable but are not the same: 1. something that happens and needs to be fixed, and 2. something that is fundamental and difficult to overcome. In my comment up-thread, I meant more the #2 definition, but you clearly mean the #1 definition.

I spent a number of years working on a Rails application that used ActiveRecord and which accrued quite a large set of data. In that job, I fixed a whole bunch of N+1 problems, and wrote sql queries with complex joins before the nicer query-builder arel syntax was implemented. I was a pretty vocal critic of AR at my company because of these sorts of headaches (though I was much happier post-arel). But these were all very surmountable problems, the libraries had the capabilities necessary to fix the problems, and as our team matured in our intuition for our data model and size, we ran into these things less and less. In other words, I see this as very much a #1 type of problem rather than a #2 type of problem.

A good analogy seems to be n^2 algorithms. In much the same way, you can write one of those and it works fine for awhile, but then one day you may find that your data has grown enough that it no longer works, and it may be very mysterious what happened. But with some experience and work, you can figure it out and probably fix it. And as you gain experience, you can more wisely decide whether to avoid the n^2 algorithm from the start, or whether it will work fine for your data size.

cryptonector · on March 19, 2020

Allow me to chime in and agree with you that in real life there exist real applications using ORMs that have or cause these problems even where the ORMs in question provide modern query facilities that don't suck. Whether this happens because the ORMs were not that advanced when the apps were developed, or the devs were not that knowledgeable, doesn't matter -- the apps exist, and I know because I've had to deal with some such.

I agree very much with the idea that the problem is fundamentally caused by wanting to present objects to apps, and that dealing with queries instead is better and the real solution. Of course, the queries can't be SQL -- we can't have that! So we end up with N+1 query languages: SQL and N more non-SQLs. But N+1 query languages is a lot better than no query languages as in earlier ORMs.

robocat · on March 19, 2020

SQL has hideous security issues with SQL injection using strings, and ActiveRecord avoids that, so SQL is bad and ActiveRecord is good. My statement is just as true and just as disingenuous as Cal’s article.

> in real life there exist real applications using ORMs that have or cause these problems even where the ORMs in question provide modern query facilities that don't suck

And surely you have also seen plenty of equivalent examples of bad code written using SQL?

Cal is attacking an imaginary ActiveRecord pattern, while ignoring the features of the most famous implementation (RoR), and ignoring the benefits of ActiveRecord. Dumb.

cryptonector · on March 19, 2020

> SQL has hideous security issues with SQL injection using strings, and ActiveRecord avoids that, so SQL is bad and ActiveRecord is good. My statement is just as true and just as disingenuous as Cal’s article.

Most SQL RDBMSes have proper parametrized queries, and there's no excuse not to use them. This is no excuse to not use SQL. It's like saying that no one should use ORMs because you can use them poorly.

> And surely you have also seen plenty of equivalent examples of bad code written using SQL?

I've seen bad Java, C, C++, Python, Perl, bash, etc... The SQL I work with is no worse than the Java, C, ... that I work with.

> Cal is attacking an imaginary ActiveRecord pattern, while ignoring the features of the most famous implementation (RoR), and ignoring the benefits of ActiveRecord. Dumb.

He was talking about a particular pattern, not Ruby on Rails.

shabda · on March 19, 2020

Django ORM is ActiveRecord pattern, right?

This is what I would use

   Book.objects.filter(author__nationality="Italian", author__state="deceased").values("isbn")

This is

- Is one single DB query.

- Completely obvious way to describe what I want to do.

- Only pulls the fields which are needed from the DB.

If you are claiming "the troublesome acriverecord pattern" you need to provide specifics of what can't be done, rather than claim "Don't look too deeply into the code samples for specifics though"

smoe · on March 19, 2020

I think as soon you use "values" it is no longer the ActiveRecord pattern since the result is not an object instance representing a single row in the table.

I never read a proper specification of ActiveRecord, but it could be that a "pure" implementation doesn't allow for joins? Hence the weird looking example with the nested for loop, which variations of I have seen plenty in real world code. Mostly by people that don't sufficiently understand the ORM or SQL, not because they were sticking to a pattern though.

In your example it is obvious what happens in this simple example, but as soon the joins become more complex, it becomes in my opinion quite unwieldy and non obvious what the resulting SQL will be. I had multiple performance problems in the past because of the queries the ORM generated or with memory/cpu usage on the app side.

LargeWu · on March 19, 2020

That's fine. For 95% of the queries you need, ActiveRecord-ish implementations are perfectly performant, and super readable. For the rest, they allow you to drop down to the SQL layer and specify exactly what you want or need.

smoe · on March 19, 2020

Yes, that is what we do and what I like about SQLAlchemy for Python. We use the ORM for like 90% of cases and drop down to its SQL abstraction underneath the ORM.

The last time I used Django a couple of years ago that was one of the bigger pains for me: It was either the Django ORM capabilities or raw SQL, not much in between. Or maybe I was just unaware of how to do it better.

kd5bjo · on March 19, 2020

> Don't look too deeply into the code samples for specifics though, they are a pastiche

Your article here seems to avoid falling into this particular trap, but I consider this a problematic view for authors to hold. In software writing, code listings have equivalent weight as figures and tables, and are often what readers will look at first to get an overview of the article. If they don't adequately demonstrate the argument of the text or don't appear to be plausible, the entire article will fall flat.

ptr · on March 19, 2020

An article dedicated to why Rail's Active Record library isn't actually using the Active Record pattern would be interesting!

zrail · on March 19, 2020

Rails' ActiveRecord is both a query builder and also the canonical Active Record pattern. For example, one can say

  book = Books.joins(:authors).where(author_name: 'James S. A. Corey').first
  book.title = "Tiamat's Wrath"
  book.save!

It's not that they stopped supporting the Active Record pattern, it's that they rebuilt the whole thing a few years ago with a real relational query builder under the hood.

mnutt · on March 19, 2020

Not to detract from your point, but rails 3 was released with arel in 2010, so it has used a relational query builder for a decade at this point.

zrail · on March 19, 2020

Thanks, I wasn't super clear on the timeline there.

inopinatus · on March 19, 2020

It’s more like the repository pattern under the bonnet.

calpaterson · on March 19, 2020

Yes I am curious about that. Someone else said that the Rails authors used to like and defend this pattern and that presumably they changed their mind. Maybe because they ran into problems? I'm not at all close to the Ruby community but would love to know.

https://lobste.rs/s/valayp/troublesome_active_record_pattern...

dhagz · on March 19, 2020

That's my general impression. As Rails grew in popularity and was used for more and more complex things (with more and more complex database tables), they started to add in the raw query building parts of ActiveRecord. Though I think it was more that they wanted raw query functionality in the library and it made most sense to enhance the tool that already dealt with the database rather than add another alongside it.

Glyptodon · on March 19, 2020

I think most ORMs that in some sense follow permutations of the active record pattern in wide use are well capable of doing joins and only selecting needed/specified columns.

I agree there are places the pattern falls short at times, but are there actually any implementations of the pattern in widespread use that would make you use separate queries in loops rather than do a basic join?

tomlock · on March 19, 2020

But in Django you can do the same thing that doesn't return all rows with `select_related` so it seems kind of like you're not using the features of whatever ORM that is.

mtberatwork · on March 19, 2020

I was confused a bit as well by the examples since Django's ORM has functionality to do exactly what the author is trying to do (e.g. select_related, prefetch_related, <model>_set, values, managers, F expressions, transactions, etc).

barrkel · on March 19, 2020

The way I see it, Active Record is an explicit acknowledgement that the database and its relations are important. AR modelling maps almost directly onto the database relations, rather than what might be more natural in OO.

It's actually harder to go straight to the database if you're using a more powerful ORM, because with a more powerful ORM, you're hiding implementation details. You tend to obfuscate the difference between a calculated and a stored column, a denormalized vs normalized attribute, composite vs entity, etc. That makes it harder to write SQL to get what you want, not easier, and when you do write the SQL, you're slicing through a thicker business layer and are at more risk of violating invariants.

If you've got a monolingual, homogeneous tech stack, the whole database might be abstracted away and you might use an OO interface which uses distributed caches and locks etc. You're much more likely then to write your queries as loops and manual filters and joins.

erik_seaberg · on March 19, 2020

I have yet to see a company more than a couple years old that remained on a monolingual, homogeneous tech stack. If your data is at all accessible to eng or PM or QA, there are extra consumers that will surprise you if you break them.

buboard · on March 19, 2020

So if you re going to have to "learn" active record, why not learn SQL instead?

hrktb · on March 19, 2020

From the article, about why mitigations of these issues is not viable:

> Likely they will require increased understanding of the specific ORM being used

So your diagnostic of "deliberately avoid reading the documentation" is spot on.

If the goal of the author is to not read the docs, I completely agree that most ORMs are an absolute hell to use and going back to almost pure SQL would be the holy grail.

But then, I am personnally in the camp of spending the time to understand the tools. There is a tradeoff between the time spent to on learning a specific tool and the ROI of that learning curve, I think this article's position on wanting mostly returns with no time invested is way too unrealistic.

PaulHoule · on March 19, 2020

He also hates forking in Git.