> Have you thought about stepping back from all of this for a few days and notic...

> Have you thought about stepping back from all of this for a few days and notice that you are wasting your time with these arguments?

Num fecisti?

> It doesn't matter how fast you can calculate a dot product or evaluate an activation function if the weights in question do not change.

That's a deliberate choice, not a fundamental requirement.

Models get frozen in order to become a product someone can put a version number on and ship, not because they must be, as demonstrated both by fine-tuning and by the initial training process — both of which update the weights.

> NNs as of right now are the equivalent of a brain scan.

First: see above.

Second: even if it were, so what? Look at the context I'm replying to, this is about energy efficiency — and applies just fine even when calculated for training the whole thing from scratch.

To put it another way: how long would it take a mouse to read 13 trillion tokens?

The energy cost of silicon vs. biology is lower than people realise, because people read the power consumption without considering that the speed of silicon is much higher: at the lowest level, the speed of silicon computation literally — not metaphorically, really literally — outpaces biological computation by the same magnitude to which jogging outpaces continental drift.