Yes, 16GB of ram is needed for 13B, 32GB for 34B (both for 4bit). The first time...

tarruda · on Oct 11, 2023

8tk/s on 34b?

I've managed to run Codellama instruct 13b with my laptop's RTX 3070 (8gb VRAM) at 6tk/s by offloading 27 layers into the GPU with llama.cpp

I've been considering getting a macbook for running 34b+ LLM inference, but with the speed in which small LLMs are progressing, I think it is better to get a laptop with an RTX 4090 and 16gb vram. Maybe It can run 34b models by offloading layers into the GPU.

syntaxing · on Oct 11, 2023

I only have a 16GB computer so I can’t confirm the 34B performance. I have a 3090 with 24GB of VRAM and 34B just fits and runs above 15 tk/s. If you want a laptop and only plan for inferencing, I think a MBP would be better than a 4090 laptop.

wahnfrieden · on Oct 11, 2023

No warm up if you switch to metal with no ANE on sonoma