Indices on a database aren't "premature optimization". They're part of understanding your data model, how it will be requested, and--more in a SQL database than a MonogDB database, but still--constraints on the data stored in your database.
Claiming basic development practices to be "premature optimization" is a fantastic way to paint yourself into a corner because of stupid decisions in your haste to "get it out the door." Your MVP isn't V if it still takes a second to render your homepage to users, because they'll leave. (Site responsiveness is a huge factor in bounce rate, even for sites that people actually want to look at.)
It's not hard to apply an index after the fact. You don't paint yourself into a corner by leaving them off initially.
And very often, you can't understand your data model and how it will be requested until you've actually built an app and gathered some data. Users will surprise you, and do things you never expected, and probably render 80% of your app (oftentimes, the 80% you worked hardest on) obsolete immediately.
I can attest to that. We have two main "sides" of our app, one side that allows for editing of rules and another that just applies those rules on transactions. It took us much longer to build the editing side than than the transaction side, and when we hit users, we found out that they didn't even WANT to edit the rules. That was more than half of our app that users weren't using, even though in initial mockups/user tests they indicated that they would edit the rules.
I consider Indices to be premature optimisation. Moreover, they're very low level, potentially dangerous premature optimisation. Imho, they should be the very last trick you use to optimise.
First, you try and figure out if you can reduce the number of repeated queries. Then you try and figure out if you can get rid of chunks of code that spawn lots of queries altogether, by tweaking the algorithm to do things differently, without needing that data. Then, finally, once you've used every trick to make it all faster, then you apply indices to speed up that handful of remaining queries.
If you apply indices first, then you won't spot the other potential optimisations, and when your indices stop covering up for your poor coding it will be much later, and much harder to fix.
No, caching should be the last trick you use to optimize.
Indices are standard way to achieve basic performance levels of a database. They may have their downsides but "potentially dangerous" is dramatically overstating the case. Furthermore, the dangers of premature optimization are about taking extra time or adding complexity to something that ultimately doesn't matter, not about using very basic features in a sane way.
The correct optimizations to make first are ones that make the biggest impact percentage-wise as well as being the most elegant in the code. Indices typically fit both categories very well. Unless you are doing a lot of stupid things, there's not going to be much lower-hanging fruit, but even if there is, after you apply sane indices that will be when your profiling will start to reveal the real interesting possibilities for optimization. The idea that indices are a good final optimization does not show much interest in real performance.
I would take swombat's advice to look at query patterns, eliminate repeated queries, and tweak your algorithm to require less data well before I thought about adding indices (or caching, which I agree is the last thing you should do).
You want to make the riskiest, most invasive changes first, because those are the ones that the rest of your codebase builds upon. If you've changed your query patterns and the app still isn't fast enough, it's relatively trivial to add indices on top of that. The speed benefits are cumulative, and none of the work you've done examining your data-access patterns has been undone by making that data access faster.
If you add indices first, however, and it still isn't fast enough, you have to examine your query patterns anyway. And this time, the work you did is undone by your further optimizations. The benefits of indices depend a lot on how you access your data: they slow down writes to speed up certain queries. If it turns out that the queries you're speeding up don't occur frequently anymore, your index is counterproductive.
(That said, since adding indices is often a task that takes 5 minutes of your time, you might want to give it a try and see what sort of results you get before investing a lot of effort in other stuff. If it doesn't work, you can back out the change and then start looking at your query patterns.)
If you're building an app that will hopefully be bigger than you expect, build cache into your data layer, it's not complex and will pay off sooner than later.
You should be thinking about indices as you design your data model. You're going to put them in eventually so the ad-hoc performance testing you're doing as you're building the site should at least somewhat reflect the final, real-world scenario. Better to know sooner rather than later that your data model is so broken that even extensive indexing can't make your queries fast.
That's just silly. Indexes are a fundamental component of databases and give the DB clues about how the data in your table is going to be used. You might as well saying having different column data types is premature optimization.
Should you spend days analyzing things and creating a million indexes? No. But you should have some idea how the tables will be used and setup a few of the obvious ones, at least. Read any book on DB admin.
I agree with you, but he was using MongoDB and AFAIK, indexing is a must with this kind of db. Actually, I hope that he does not expect a big write load because he seemed to have added the indexes pretty quickly without thinking too much about it...
Just to reiterate, with a RDBMS with a decent query planner, indexing early is premature optimisation.
Ahem, this is so wrong it's not even funny. I'm just calling it out because there are people here who are just learning how to code. You always create an index, on pretty much every query. That's as basic as commenting your code.
The cost increases significantly by doing it later on. Indexing when you have little data is easy. Indexing on a live site with millions of rows and lots of users can be much harder.
Taking it to that extreme is not a valid argument. There's a lot of room between not doing it first and only doing on a live site with millions of rows and lots of users.
That might be true for some esoteric indexes, but if we're talking about indexing customer_id on the invoices table, it's not premature -- it's freaking inevitable.
Yes--and these are the overwhelmingly most common case for a query in most applications. From reading the blog post, it seems like the blog author was either doing pretty frighteningly complex stuff just to render his homepage or he didn't think ahead enough to add fairly standard queryable indices to his collections.
Neither strikes me as terribly smart, and the latter strikes me as writing it right the first time, not "avoiding premature optimizations."
Claiming basic development practices to be "premature optimization" is a fantastic way to paint yourself into a corner because of stupid decisions in your haste to "get it out the door." Your MVP isn't V if it still takes a second to render your homepage to users, because they'll leave. (Site responsiveness is a huge factor in bounce rate, even for sites that people actually want to look at.)