More

DogRunner · 2025-11-12T17:14:44 1762967684

Apple didn't design the PowerPC or make custom variances. Motorola and IBM did it. Especially Altivec was added by Motorola, and IBM didn't like to add it to their PowerPC CPUs when Apple asked for help, when Motorola had the 500 MHz glitch bug back in the day.

There is a nice coverage on this topic at https://www.youtube.com/watch?v=Tld91M_bcEI (Why the Original Apple Silicon Failed)

DogRunner · 2025-06-18T19:32:02 1750275122

FTP, Email, Webspace and newsgroups access was the norm. good old times at uunet ...

DogRunner · 2025-06-06T10:27:02 1749205622

I used a similar budget and build something like this:

7x RTX 3060 - 12 GB which results in 84GB Vram AMD Ryzen 5 - 5500GT with 32GB Ram

All in a 19-inch rack with a nice cooling solution and a beefy power supply.

My costs? 1300 Euro, but yeah, I sourced my parts on ebay / second hand.

(Added some 3d printed parts into the mix: https://www.printables.com/model/1142963-inter-tech-and-gene... https://www.printables.com/model/1142973-120mm-5mm-rised-noc... https://www.printables.com/model/1142962-cable-management-fu... if you think about building something similar)

My power consumption is below 500 Watt at the wall, when using LLLMs,since I did some optimizations:

* Worked on power optimizations and after many weeks of benchmarking, the sweet spot on the RTX3060 12GB cards is a 105 Watt limit

* Created Patches for Ollama ( https://github.com/ollama/ollama/pull/10678) to group models to exactly memory allocation instead of spreading over all available GPUs (This also reduces the VRAM overhead)

* ensured that ASPM is used on all relevant PCI components (Powertop is your friend)

It's not all shiny:

* I still use PCIe3 X1 for most of the cards, which limits their capability, but all I found so far (PCIe Gen4 x4 extender and bifurcation/special PCIE routers) are just too expensive to be used on such low powered cards

* Due to the slow PCIe bandwidth, the performance drops significantly

* Max VRAM per GPU is king. If you split up a model over several cards, the RAM allocation overhead is huge! (See Examples in my ollama patch about). I would rather use 3x 48GB instead of 7x 12G.

* Some RTX 3060 12GB Cards do idle at 11-15 Watt, which is unacceptable. Good BIOSes like the one from Gigabyte (Windforce xxx) do idle at 3 Watt, which is a huge difference when you use 7 or more cards. These BIOSes can be patched, but this can be risky

All in all, this server idles at 90-100Watt currently, which is perfect as a central service for my tinkerings and my family usage.

reginald78 · 2025-06-09T16:42:04 1749487324

Great info in this post with some uncommon questions answered. I have a 3060 with unimpressive idle power consumption, interesting that it varies so much.

I know it would increase the idle power consumption, but have you considered a server platform instead of Ryzen to get more lanes?

Even so, you could probably get at least 4x for 4 cards without getting to crazy. 2 m.2 -> pcie adapters, the main GPU slot and the fairly common 4x wired secondary slot.

Splitting the main 16x GPU slot is possible but whenever I looked into this I kind of found the same thing you did. In addition to being a cabling/mounting nightmare the necessary hardware started to eat up enough total system cost that just ponying up for a 3090 started to make more sense.

jononor · 2025-06-09T06:31:20 1749450680

Impressive! What kind of motherboard do you use to host 7 GPUs?

iamnotagenius · 2025-06-09T09:36:00 1749461760

My 3060 idles sometimes at 19 watt, only sleep and wakeup of the machine helps.

DogRunner · 2025-04-30T15:47:28 1746028048

If you want to see the included images, jump back to 2022: https://web.archive.org/web/20221218011802/https://akapugs.b...

DogRunner · on March 27, 2025

I am a German native speaker as well. That is horrible idea!

DogRunner · on March 20, 2025

A simple explanation can be seen here: https://www.youtube.com/watch?v=7j_NE6Pjv-E

DogRunner · on Feb 16, 2025

>cheap easy eats

and mostly unhealthy in comparison to self hunted / self-planted food.

gwbas1c · on Feb 16, 2025

Do you mostly eat self hunted and self planted food?

There's a big reason why the vast majority of people choose to buy their food.

gus_massa · on Feb 16, 2025

Fish is probably the only wild animal/plant with masive consuption. (That is the closest to we all decide that nobody herd/cultivate, only hunter/gather.)

DogRunner · on Jan 28, 2025

>For optimal performance, we recommend the sum of VRAM + RAM to be at least 80GB+.

Oh nice! So I can try it in my local "low power/low cost" server at home.

My homesystem does run in a ryzen 5500 + 64gb RAM + 7x RTX 3060 12gb

So 64gb RAM plus 84gb VRAM

I dont want to brag around, but point to solutions for us tinkerers with a small budget and high energy costs.

such system can be build for around 1600 euro. The power consumption is around 520 watt.

I started with a AM4 Board (b450 Chipset) and one used RTX 3060 12gb which cost around 200 Euro used if you are patient.

There every additional GPU is connected with the pcie riser/extender to give the cards enough space.

After a while I had replaces the pcie cards with a single pcie x4 to 6x PCIe x1 extender.

It runs pretty nice. Awesome to learn and gain experience

tucnak · on Jan 28, 2025

How are you arriving at those numbers?

ryzen 5500 + 7x3060 + cooling ~= 1.6 kW off the wall, at 360 GB/s memory bandwidth, and considering your lane budget, most of it will be wasted in single PCIe lanes. After-market unit price of 3060's is 200 eur, so 1600 is not good-faith cost estimate.

From the looks of it, your setup is neither low-power, nor low-cost. You'd be better served with a refurbished mac studio (2022) at 400GB/s bandwidth fully utilised over 96 GB memory. Yes, it will cost you 50% more (considering real cost of such system closer to 2000 eur) however it would run at a fraction of power use (10x less, more or less)

I get it that hobbyists like to build PC's, but claiming that sticking seven five year out of date low-bandwidth GPU's in a box is "low power/low cost" is a silly proposition.

You're advocating for e-waste

benjiro · on Jan 28, 2025

The issue is that you are taking max GPU power draw, as a given. Running a LLM does not tax a GPU the same way a game does. There is a rather know Youtuber, that ran LLMs on a 4090, and the actual power draw was only 130W on the GPU.

Now add that this guy has 7x3060 = 100% miner. So you know that he is running a optimized profile (underclocked).

Fyi, my gaming 6800 draws 230W, but with a bit of undervolting and sacrificing 7% performance, it runs at 110W for the exact same load. And that is 100% taxed. This is just a simple example to show that a lot of PC hardware runs very much overclocked/unoptimized out of the box.

Somebody getting down to 520W sounds perfectly normal, for a undervolted card that gives up maybe 10% performance, for big gains in power draw.

And no, old hardware can be extreme useful in the right hands. Add to this, its the main factor that influences the speed tends to be more memory usage (the more you can fit and the interconnects), then actual processing performance for running a LLM.

Being able to run a large model for 1600 sounds like a bargain to me. Also, remember, when your not querying the models, the power will be mostly the memory wakes + power regulators. Coming back to that youtuber, he was not constantly drawing that 130W, it was only with spikes when he ran prompts or did activity.

Yes, running from home will be more expensive then a 10$ copilot plan but ... nobody is also looking at your data ;)

DogRunner · on Jan 29, 2025

Thanks for the clarification. Surely, If I run hashcat benchmark the power consumption goes nearly to 1400 Watt, but I also limited the max power consumption for each card to 100 Watt, which worked out better than limiting the max gpu frequency. To be fair, the most speed comes from the RAM frequency - as long as this is not limited, it works out great.

I took a fair amount of time to get everything to a reduced power level and measured several llm models (and hashcat for the extreme) to find the best speed per watt, which is usally around 1700-1900 mhz or limiting 3060 to 100 to 115 watt.

If I planned it in the first run, I may got away with a used mac studio, thats right. However, I incrementally added more cards as I moved further into exploration.

I didn't wanted to confront someone, but it looks like you either show of 4x 4090 or you keep silent

benjiro · on Jan 29, 2025

np ;)

I am amazed these days people lacking knowledge about hardware, and the mass benefits of undervolting/power limiting hardware. Its like people do not realize that what is sold, is often overclocked/too high vcore. The amount of people i see buying insane overspec PSUs, and go O_o ...

How is your performance with the different models on your setup?

tucnak · on Jan 29, 2025

"Undervolting" is a thing for 3090s where they get them down from 350 to 300W at 5% perf drop but for your case it's irrelevant because your lane budget is far too little!

> know Youtuber, that ran LLMs on a 4090, and the actual power draw was only 130W on the GPU.

Well, let's see his video. He must be using some really inefficient backend implementation if the GPU wasn't utilised like that.

I'm not running e-waste. My cards are L40S and even in basic inference, no batching with ggml cuda kernels they get to 70% util immediately.

DogRunner · on Jan 26, 2025

That was a good one! Keep up your humor. It's a tough environment out there.

DogRunner · on June 3, 2024

I really like build123d[1], which is based on CadQuery. There are a few nice IDEs for it and to make it simple: It's python based code-based CAD.

[1]https://github.com/gumyr/build123d