Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

A friend of mine once argued that adding a cache to a system is almost always an indication that you have an architectural problem further down the stack, and you should try to address that instead.

The more software development experience I gain the more I agree with him on that!



When all else fails, use caches. If all else hasn’t failed, it will once you use caches.


Caches suck because invalidation needs to be sprinkled all over the place in what is often an abstraction-violating way.

Then there's memoization, often a hack for an algorithm problem.

I once "solved" a huge performance problem with a couple of caches. The stain of it lies on my conscience. It was actually admitting defeat in reorganizing the logic to eliminate the need for the cache. I know that the invalidation logic will have caused bugs for years. I'm sure an engineer will curse my name for as long as that code lives.


If you have no cache, and your first thought is "this needs a cache", you're probably right. Chances are you need to optimize a query or storage pattern. But you're thinking like an engineer. It may be true that there is a "more correct" engineering solution, but adding a cache might be the most expedient solution.

But after you'd done all the optimizations, there is still a use case for caches. The main one being that a cache holds a hot set of data. Databases are getting better at this, and with AI in everything, latency of queries is getting swamped by waiting for the LLM, but I still see caches being important for decades to come.


Most of the time I use caching it's to cut down on network round trips. If I'm fetching data on every end user request that only updates daily or weekly caching that's a no-brainer. Edge caching for content sites is also a no-brainer. Caching something computationally expensive may be fishy but also may be useful. Even if you are just papering over some inefficient process, that's not necessarily a sin. Sometimes you have to be pragmatic.


That's true in my experience.

Caches have perfectly valid uses, but they are so often used in fundamentally poor ways, especially with databases.


I'd argue the database falls into that category.

The two questions no one seems to ask are 'do I even need a database?', and 'where do I need my database?'

There are alternate data storage 'patterns' that aren't databases. Though ultimately some sort of (Structure) query language gets invented to query them.


Yeah my architecture problem is that Postgres RDS EBS storage is slow as dog. Sure our data won’t go poof if we lose an instance but it’s so slow.

(It’s not really my architecture problem. My architecture problem is that we store pages as grains of sand in a db instead of in a bucket, and that we allow user defined schemas)


If you think of it as a cache, yes. If you think of it as another data layer then no.

For example, let’s say that every web page your CMS produces is created using a computationally expensive compilation. But the final product is more or less static and only gets updated every so often. You can basically have your compilation process pull the data from your source of truth such as your RSBMS but then store the final page (or large fragments of it) in something like MongoDB. In other words the cache replacement happens at generation time and not on demand. This means there is always a cached version available (though possibly slightly stale), and it is always served out of a very fast data store without expensive computation. I prefer this style of caching to on demand caching because it means you avoid cache invalidation issues AND the thundering herd problem.

Of course this doesn’t work for every workflow but I can get you quite far. And yes this example can also be sort of solved with a static site generator but look beyond that at things like document fragments, etc. This works very well for dynamic content where the read to write ratio is high.


Quite agree, this is how I explain it to people. When you think of cache as another derived dataset then you start to realize that the issues caches bring to architectures are often the result of not having an agreement between the business and engineering on acceptable data consistency tolerances. For example, outside the world of caching, if you email users a report, and the data is embedded in the email, then you are accepting that the user will see a snapshot of data at a particular time. In many cases this is fine, even preferred. Sometimes not, and instead you link the user to a realtime dashboard instead.

Pretty much every view the user sees of data should include an understanding as to how consistent that data is with the source of truth. Issues with caching (besides basic bugs) often come up when a performance issue comes up and people slap in a cache without renegotiating how the end user would expect the data to look relative to its upstream state.


The cache is an incomplete dataset by definition. It’s not a data set, it’s a cache of a data set. You can never ensure you get a clean read of the system state from the cache because it’s never in sync and has gaps.


What about materialized views? CPU cache? Only the Sith deal in absolutes :)


CPU cache means that the same value read twice will return the same value. Some exceptions for NUMA, and mu[tiple threads. But two reads of a cache cache make no such guarantees.

There is a vast number of undiagnosed race conditions in modern code cause by cache eviction in the middle of 'transactions' under high system load.


No.

It’s not a data layer, it’s global shared state. Global shared state always has consequences. Sometimes the consequences are worth the trouble. But it is trouble.

If you think about Source of Truth, System of Record, cache is neither of those, and sits between them. There’s a lot of problems you can fix instead by improving the SoT or SoR situation in that area if the code.


in particular, the database already _has_ a cache. usually its on the other side of the evaluation, at the block layer. which means that you have a pay a cost to get to it (the network protocol, and the evaluation).

if you use materialized views, that surfaces exactly what you want in a cache, except here the views consistency with the underlying data is maintained. that's hugely important.

that leaves us with the protocol. prepared statements might help. now we really should be about the same as the bump-on-the-wire cache. that doesn't get us the same performance is the in-process cache. but we didn't have to sacrifice any performance or add any additional operational overhead to get it.


Hard disagree. Having used the architecture I described in large practical deployments it works way better than what you are making it out to be. But I don’t know the domain you work in and your constraints so it is possible that for you it would not work.


I already typed a longer comment elsewhere that I don’t feel like reiterating but I agree with you. Caching is a natural outcome of not having infinite time and memory for running programs. Sometimes it’s a bandaid over bad design, but often it’s a responsible decision to take load off of other important systems


Lost me at DumpsterFireDB as cache. But if the goal is to create an even worse architecture thats even harder to maintain, go for it.


Sorry you lack the imagination to substitute your preferred data store into what I wrote. Hope it gets easier.


I'll never have enough imagination to believe mongo is a good solution. Postgres has jsonb, vector type; redis is a fine-enough cache. Why use a known junk "database" when there are superior solutions and truly open source?


I didn’t say you have to use it. I said you could. Or any other data store that fits your use case. I used a MongoDB instance back in 2012 in a serious production environment in this exact way and it worked flawlessly while Postgres was what gave us trouble (it had a bunch of features added since that would have made those issues disappear but back then it didn’t have built in replication for example.)

But again this is not an endorsement of MongoDB. I wouldn’t use it today but I did use it successfully and that company and tech stack sold for quite a bit of money and the software still runs, though I’m not sure on what stack. Again, if you are stuck on this one part of my comment… can’t help you.


I disagree. For large search pages where you're building payloads from multiple records that don't change often, it could be beneficial to use a cache. Your cache ends up helping the most common results to be fetched less often and return data faster.




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: