You can still buy used 3090 cards on ebay. 5 of them will give you 120GB of memo...

rybosworld · 2026-03-27T14:57:18 1774623438

I don't see how 5x 3090's is a better option than an M3 Ultra Mac studio.

The mac will just work for models as large as 100B, can go higher with quantized models. And power draw will be 1/5th as much as the 3090 setup.

You can certainly daisy chain several 3090's together but it doesn't work seamlessly.

whywhywhywhy · 2026-03-27T18:04:37 1774634677

> You can certainly daisy chain several 3090's together

It's not "daisy chaining" 3090 has NVLink.

angoragoats · 2026-03-27T22:49:05 1774651745

FWIW I have never used NVLink, and I’m not sure why people are bringing up “daisy chaining” because as far as I’m aware that is not a thing with modern GPUs at all.

rybosworld · 2026-03-27T18:17:05 1774635425

Really? How would you NVLink more than 2 3090's?

angoragoats · 2026-03-27T15:19:51 1774624791

> The mac will just work for models as large as 100B, can go higher with quantized models. And power draw will be 1/5th as much as the 3090 setup.

This setup will work for 100B models as well. And yes, the Mac will draw less power, but the Nvidia machine will be many times faster. So depending on your specific Mac and your specific Nvidia setup, the performance per watt will be in the same ballpark. And higher absolute performance is certainly a nice perk.

> You can certainly daisy chain several 3090's together but it doesn't work seamlessly.

Citation needed; there's no "daisy chaining" in the setup I describe, and low level libraries like pytorch as well as higher level tools like Ollama all seamlessly support multiple GPUs.

lowbloodsugar · 2026-03-27T15:46:56 1774626416

How much does it cost to have an electrician wire up 240v circuit just to power the thing?

angoragoats · 2026-03-27T22:40:48 1774651248

The machine I’m describing works just fine on a dedicated 15A 120V circuit.

lowbloodsugar · 2026-03-28T01:12:08 1774660328

5x3090 in 1600W?

angoragoats · 2026-03-28T17:48:13 1774720093

1800W is the max on a 15A circuit, but yes, it’s usually under 1600W. For LLM inference, limiting the TDP to 225W or so per card saves a lot of power, for a 5% drop in performance.

rybosworld · 2026-03-27T15:49:59 1774626599

I think it's bad form to say "citation needed" when your original claim didn't include citations.

Regardless - there's a difference between training and inference. And pytorch doesn't magically make 5 gpus behave like 1 gpu.

angoragoats · 2026-03-27T22:47:39 1774651659

> I think it's bad form to say "citation needed" when your original claim didn't include citations.

I apologize, but using multiple GPUs for inference (without any sort of “daisy chaining”) is something that’s been supported in most LLM tooling for a long time.

> Regardless - there's a difference between training and inference.

No one brought up training vs. inference to my knowledge, besides you — I was assuming the machine was for inference, because my experience building a machine like the one I described was in order to do inference. If you want to train models, I know less about that, but I’m pretty sure the tooling does easily support multiple GPUs.

> And pytorch doesn't magically make 5 gpus behave like 1 gpu.

I never said it was magic, I just said it was supported, which it is.