Also keep in mind: 32GB of RAM is more than enough for normal usage, but it's us...

tosh · on May 13, 2024

With 32 GB RAM you can do inference with quantized 34b models. I wouldn’t call that useless?

You don’t need a GPU for llm inference. Might not be as fast as it could be but usable.

hnfong · on May 13, 2024

It's almost a myth these days that you need top end GPUs to run models. Some smaller models (say <10B parameters with quantization) run on CPUs fine. Of course you won't have hundreds of tokens per sec, but you'll probably get around ~10 or so, which can be sufficient depending on your use case.

999900000999 · on May 13, 2024

I'm not planing on developing state of the art ML, I just need to run the models locally and maybe do some light tuning.

I don't want to have a laptop over 3 pounds and I'm not spending over 1100$, so a dedicated GPU isn't really an option.