I think thats exactly the point so everyone can run it on their PCs with no GPU.

lolinder · on March 13, 2023

Or without a beefy GPU. I've got 8GB VRAM, which is great for Stable Diffusion but not useful for any of the language models released so far.

I think the 4-bit 7B LLaMA would work, but the 7B is pretty fast anyway without GPU.

boredemployee · on March 13, 2023

I'm installing it here. How's the 7B model going so far?

lolinder · on March 13, 2023

Haha, I just finished ordering 32GB of additional memory for my PC so I can run the 65B model, if that tells you anything. I'm upgrading from 32GB -> 64GB.

7B is fine, 13B is better. Both are fun toys and almost make sense most of the time, but even with a lot of parameter tuning they're often incoherent. You can tell that they have encoded fewer relationships between concepts than the higher-parameter models we've gotten used to--it's much closer to GPT-2 than GPT-3.

They're good enough to whet my appetite and give me a lot of ideas of what I want to do, they're just not quite good enough to make those applications reliably useful. Based on the reports I'm hearing here of just how much better the 65B model is than the 7B, I decided it was worth $80 for a few new sticks of RAM to be able to use the full model. Still way cheaper than buying a graphics card capable of handling it.

Semaphor · on March 13, 2023

Heh, you just made me upgrade as well. After originally paying 130 € for 32 GB, it’s nice that I only had to pay 70 € to double it ;) Not sure if I want to run LLMs (or if my Ryzen 5 3600 is even powerful enough), but I’ve wanted some more RAM for a while.

iambateman · on March 13, 2023

If I was running in a server context, would the 50gb of ram be required to respond to one request, or can it be used to respond to multiple requests simultaneously?

lolinder · on March 17, 2023

I'm very late to this question, but I believe that that amount is only required once, but the context tensor will need to be created per request. I haven't confirmed that, though.

boredemployee · on March 13, 2023

I'd assume that all the calculations used for 1 request would already eat up that amount of memory, but I could be wrong!

radicalbyte · on March 13, 2023

I'm still holding on to a small bit of hope that the GPU market will normalise this year. Don't think that I'm the only one looking to get something highly capable but for a fair price.

dragonwriter · on March 13, 2023

> I’m still holding on to a small bit of hope that the GPU market will normalize this year.

I suspect all the people hoping it will (b/c of Stable Diffusion, etc.) are exactly the reason it won’t.

boredemployee · on March 13, 2023

Me too. But for 3rd world countries its mad priced.

radicalbyte · on March 15, 2023

It's expensive for first-world countries too. Just look at the 4090 - it's insane that it costs 2k EUR... it's literally double the fair price (which itself is high).