Lately I've been thinking a lot about data cubes and how their use cases and met...

Lately I've been thinking a lot about data cubes and how their use cases and methodologies for making them applicable are very similar to most machine learning algorithms. I don't mean how the output is generated or how things are programmed. What I mean is that they both tend to produce far more output than is practically useful. Additionally, it can be very easy to look at any small part of the output and draw incorrect conclusions.

To clarify, when I talk about ML I'm primarily referring to classifier algorithms and approaches (including nlp). In the large part the ML is being used to generate classifier rules which generalize patterns, and data cubes are often used to look for aggregations and data sequences which generalize patterns. The problem is that random patterns happen all the time, and may even persist for a long time despite a lack of real correlation. Semantic analysis of data cube output is really important in order to find meaningful patterns.

What I'm getting at is I often wonder why most ML projects try to treat it like it's magic. Human assisted learning has shown repeatedly to be the system which actually works in practical application. The classifier output needs to be pruned to remove rules that only held true in the sample data, or were merely coincidental, or simply have no practical value.

Approaches like this are not cheap to set up and may in the end still only produce the same results as the existing entirely non-ML based system. What is the likely scale of work compared to the benefit is the first question I ask myself before working on anything. If I don't have objective data to answer that you have to do some research to find out. Never try to build a massive or complicated system you don't have objective reasons to expect will be worth the effort. That's precisely what people have been doing with ML constantly. It's little wonder most developers have such low opinions of ML projects.