Because everyone in these replies is in complete denial about the physical limits of memory and scaling in general. Ya'll literally living in an alternate reality where model capability increases with a decrease in size, its simply not the case. There will be small focused models that preform well on very narrow tasks, yes, but you will not have "agents" capable of "building most things" running on consumer hardware until more capable (and affordable) consumer hardware exists.
Correct, the progress is not perfectly linear. But do you believe technological progress has stalled forever? If so, I'd get out of tech and start selling bomb shelters.
Do you really think the trend of consumer hardware is heading towards more memory and better specs? Apple's most popular product this year is an 8gb of RAM laptop..
The trend is heading in the opposite direction, less options for strong consumer hardware and towards cloud based products. This is a memory issue more than anything. Nvidia is done selling their ddr7 to gamers and people with AI girlfriends.
There are physical limits to how much you can compress data. I'm just saying, don't sit on your hands waiting for this to happen, becuase its probably not going to for another decade +. There's no use in waiting, just write the code your fkin self and stop being lazy.
Just so that I have your position straight: you actually believe that over the long term, like 10, 20 years, that the amount of RAM in a laptop is going to go down?
It's not out of the realm of possibility, but I just want to make you aware that this would be a very surprising development in computing history.
I guess we'll find out! I bet all the vendors who supply RAM are looking at the current shortages and thinking "well, it's a shame we could never manufacture more RAM than we currently do."
A future with less RAM is possible with more applications using computational storage with ssd/nvme.
But that's not my main argument is that its delusional for OP thinks its reasonable to expect that soon we'll be able to run models on consumer hardware that will be able to build basically most things,
But I do think there will be many compromises made for consumer electronics, I don't think the powers that be are eager to give consumers all the best memory (that should be clear by now) There's 3 DDR5 DRAM manufactures in the world that have to provide memory to all the world's militaries, governments, datacenters/corporations. Consumers are last priority.
> If you looked at a graph of GPU power in consumer hardware and model capability per billion parameters over time, it seems inevitable that in the next few years a "good enough" model will run on entry-level hardware.
Of course there will always be larger flagship models, but if you can count on decent on-device inference, it materially changes what you can build.
I'm making some assumptions about what they're saying, but it seems clear they have no idea what they're about and that they're betting their competency on this technology.
If you're not paying attention to what's happening with small models, I suggest you take a closer look. Keeping parameter count constant, the quality of small models is rising fast. When you look at what you could do with Llama just 3 years ago vs Gemma 4 on the same 16GB hardware, the trend is clear.
Meanwhile, this year Apple bumped the base of their Mac lineup from 8GB to 16GB RAM, and the iPhone 17 Pro ships with 12GB. The Neo is at 8GB but is a brand new product tier which is not comparable to any past model.
Small models are gaining useful reasoning ability and that's a genuinely helpful development, but they'll be heavily limited in world knowledge for the foreseeable future. BTW, the base of the Mac lineup is now once again a 8GB device with a small and low-performance SSD. Many people will tell you that it's broadly comparable (though of course not identical!) to the original base model M1.
For many tasks, including lots of agentic applications, world knowledge is not a "must-have."
To me the Neo is an exception, and doesn't represent the core Mac lineup, which is all at 16GB+ of RAM. If you're developing pro software that would rely on an on-device LLM, you probably wouldn't be targeting the Neo anyway.
Anything can technically "run" on almost any hardware, the meaningful question is what's the real-world performance. I for one have made a case in this thread that DeepSeek V4 is de facto optimal for wide batching, not single-request or single-agent inference - even on consumer hardware (which is unique among practical AI models). I might still be wrong of course, but if so I'd like to understand what's wrong with my assumptions.