So below 128gb is the sweet spot for local LLMs...

aenis · 2026-03-04T13:34:02 1772631242

TBH, they are all rather useless at those sizes.

I used to run a lot of local models on my mbp - mainly stt, tts, embeddings and diffusion models - and small LLMs used for utility purposes - but stopped. It saves time in the long run to run those models on target architecture from the get go - which in most cases is nvidia/cuda - rather than test and tweak on metal, and then switch to cuda for prod - and experience weird and subtle differences and regressions. I don't think it makes much sense to develop anything (other than hobby projects for home use) on mlx atm.