If you could cache all the inputs that go into a pure function, you'd never need to invalidate the cache. Your cache becomes just a lazily-created tabulation of your function - which is fixed, as a map is just another representation of a mathematical function.
So cache invalidation happens only where your cache key isn't containing all the parameters of the cached computation - your cache isn't a mathematical function. Now the problem is, in real world, it's almost impossible to fully capture all the parameters of a computation in a way that gives you a usable cache (one that trades time/CPU for memory). So we need to track the possible changes in the "hidden state", and invalidate a cache entry if we detect such state. That's where all the complexity sits.
For instance, take a problem of a function countWords(file), that counts words in a file. How would we cache its results? If we key by file name, the cache will be wrong when the file gets changed. Keying by the hash of file contents is out of the question, because hashing is at least O(n) with respect to file size, just as countWords(), so we gain nothing. Keying by file name + modification date will fail if the clock changes, or file gets updated faster than the time resolution we're using. Not to mention, we'll have false positives if the date changes but the contents stay the same.
But assuming that the filesystem is sane and nobody messes with the clock, what else could happen? The definition of what a "word" is can change, making the entire cache bad. We may have found a bug in countWords(), updated the code, and now the cached results are also bad. The contents of the cache may have been changed externally (e.g. if shared by multiple functions, or multiple processes using countWords(), with different definition of "word" or with partially deployed bugfix). The code of the cache itself may have been updated to fix a storage bug there. Etc.
At some point you have to decide, which of those issues you care about and which you ignore - and for those you care about, how to monitor for changes, propagate them to the cache, and do it fast enough that none of the cache users will see invalid data in between detection and propagation. That's how cache invalidation is hard.
Every pure function is part of a bigger machine that's eventually mutable state, or by definition that pure function has no reason to exist. Ergo, cache invalidation is hard.
But the universe is constantly changing. So you will need to know the specific bits of the universe that your function depends on or you will never get a hit.
Ha, I guess it's possible. But if that's true then it seems that some of the intermediate steps in "public static void universe()" use mutable state that can result in cache invalidation difficulties.
If you could cache all the inputs that go into a pure function, you'd never need to invalidate the cache. Your cache becomes just a lazily-created tabulation of your function - which is fixed, as a map is just another representation of a mathematical function.
So cache invalidation happens only where your cache key isn't containing all the parameters of the cached computation - your cache isn't a mathematical function. Now the problem is, in real world, it's almost impossible to fully capture all the parameters of a computation in a way that gives you a usable cache (one that trades time/CPU for memory). So we need to track the possible changes in the "hidden state", and invalidate a cache entry if we detect such state. That's where all the complexity sits.
For instance, take a problem of a function countWords(file), that counts words in a file. How would we cache its results? If we key by file name, the cache will be wrong when the file gets changed. Keying by the hash of file contents is out of the question, because hashing is at least O(n) with respect to file size, just as countWords(), so we gain nothing. Keying by file name + modification date will fail if the clock changes, or file gets updated faster than the time resolution we're using. Not to mention, we'll have false positives if the date changes but the contents stay the same.
But assuming that the filesystem is sane and nobody messes with the clock, what else could happen? The definition of what a "word" is can change, making the entire cache bad. We may have found a bug in countWords(), updated the code, and now the cached results are also bad. The contents of the cache may have been changed externally (e.g. if shared by multiple functions, or multiple processes using countWords(), with different definition of "word" or with partially deployed bugfix). The code of the cache itself may have been updated to fix a storage bug there. Etc.
At some point you have to decide, which of those issues you care about and which you ignore - and for those you care about, how to monitor for changes, propagate them to the cache, and do it fast enough that none of the cache users will see invalid data in between detection and propagation. That's how cache invalidation is hard.