More

chatmasta · 2026-03-31T06:52:14 1774939934

What’s their official policy on purchasing two subscriptions?

pmx · 2026-03-31T09:22:13 1774948933

I've done this with Cursor because I have similar issues with inconsistent allowance consumption there. I mostly use Claude models but I've had to disable Opus 4.6 because it just EATS tokens in it's thinking steps.

chatmasta · 2026-03-30T01:19:16 1774833556

You underestimate just how annoying iOS autocomplete can be.

christoph · 2026-03-30T04:53:37 1774846417

I really don’t! I switched it all of months ago - autocomplete, autocaps, all of it. I reached a point where the constant frustration had to be worse than any productivity gain it was hoping to offer.

A few months on… I like it! Frustration is all gone, any errors are just on me now, and it forces me to slow down a bit and use the brain a bit more!

SilverElfin · 2026-03-30T01:26:39 1774833999

Not just iOS but macOS too. And it seems to only get worse. And with no notice to users. And no response in their forums.

alwillis · 2026-03-30T06:42:25 1774852945

I’ve been using Cotypist on macOS [1].

Sometimes it feels like it’s reading my mind when I’m typing.

[1]: https://cotypist.app/

dylan604 · 2026-03-30T03:16:59 1774840619

oh, no, I notice, but typically not until after hitting send

chatmasta · 2026-03-30T03:57:13 1774843033

I swear sometimes it doesn’t apply the corrections until I submit the form. It’s infuriating.

chatmasta · 2026-03-27T22:41:40 1774651300

All I want is for my agent to save me time, and to become a _compounding_ multiplier for my output. As a PM, I mostly want to use it for demos and prototypes and ideation. And I need it to work with my fractured attention span and saturated meeting schedule, so compounding is critical.

I’m still new to this, but the first obvious inefficiency I see is that I’m repeating context between sessions, copying .md files around, and generally not gaining any efficiency between each interaction. My only priority right now is to eliminate this repetition so I can free up buffer space for the next repetition to be eliminated. And I don’t want to put any effort into this.

How are you guys organizing this sort of compounding context bank? I’m talking about basic information like “this is my job, these are the products I own, here’s the most recent docs about them, here’s how you use them, etc.” I would love to point it to a few public docs sites and be done, but that’s not the reality of PM work on relatively new/instable products. I’ve got all sorts of docs, some duplicated, some outdated, some seemingly important but actually totally wrong… I can’t just point the agent at my whole Drive and ask it to understand me.

Should I tell my agent to create or update a Skill file every time I find myself repeating the same context more than twice? Should I put the effort into gathering all the best quality docs into a single Drive folder and point it there? Should I make some hooks to update these files when new context appears?

tern · 2026-03-27T22:58:49 1774652329

It's too early. People are trying all of the above. I use all of the above, specifically:

- A well-structured folder of markdown files that I constantly garden. Every sub-folder has a README. Every files has metadata in front-matter. I point new sessions at the entry point to this documentation. Constantly run agents that clean up dead references, update out of date information, etc. Build scripts that deterministically find broken links. It's an ongoing battle.

- A "continuation prompt" skill, that prompts the agent to collect all relevant context for another agent to continue

- Judicious usage of "memory"

- Structured systems made out of skills like GSD (Get Shit Done)

- Systems of "quality gate" hooks and test harnesses

For all of these, I have the agent set them up and manage them, but I've yet to find a context-management system that just works. I don't think we understand the "physics" of context management yet.

chatmasta · 2026-03-28T01:07:41 1774660061

On your first point, one unexpected side effect I’ve noticed is that in an effort to offload my thinking to an agent, I often end up just doing the thinking myself. It’s a surprisingly effective antidote to writer’s block… a similar effect to journaling, and a good reason why people feel weird about sharing their prompts.

nzoschke · 2026-03-28T01:19:01 1774660741

The best thing you can do is help build and maintain high quality docs.

Great docs help you, your agents, your team and your customers.

If you’re confused and the agent can’t figure it out reliably how can anyone?

Easier said than done of course. And harder now than ever if the products are rapidly changing from agentic coding too.

One of my only universal AGENTS.md rules is:

> Write the pull request title and description as customer facing release notes.

chatmasta · 2026-03-28T01:59:02 1774663142

I’ve been thinking about this a lot. It’s obviously the ideal state of things. The challenge is that we’ve got existing docs frameworks and teams and inertia and unreleased features… and I don’t have time to wait for that when I’m trying to get something done today. Not to mention the trade off of writing in public vs. private.

One quick win I’ve thought could bridge this is updating our docs site to respond to `Accept: text/markdown` requests with the markdown version of the docs.

nsvd2 · 2026-03-29T23:41:26 1774827686

I've come to the point where, if the agent makes a wrong assumption about the code base with fresh context, I consider that the code is not obvious enough about it's intent.

tern · 2026-03-30T04:30:19 1774845019

Ah, thanks so much for this question. I ended up building a tool that agent can use to track 'compounding' in my corpus of .markdown files. Keep iterating and thinking about it, and you may find you can do the same for your process.

makerdiety · 2026-03-28T00:22:23 1774657343

Sounds like you need OpenClaw's assistance.

chatmasta · 2026-03-27T11:40:51 1774611651

On the other hand it forced developers to invest more in Metal which looks like an investment starting to bear fruit.

chatmasta · 2026-03-27T05:36:24 1774589784

I love everything about this direction except for the insane inference costs. I don’t mind the training costs, since models are commoditized as soon as they’re released. Although I do worry that if inference costs drop, the companies training the models will have no incentive to publish their weights because inference revenue is where they recuperate the training cost.

Either way… we badly need more innovation in inference price per performance, on both the software and hardware side. It would be great if software innovation unlocked inference on commodity hardware. That’s unlikely to happen, but today’s bleeding edge hardware is tomorrow’s commodity hardware so maybe it will happen in some sense.

If Taalas can pull off burning models into hardware with a two month lead time, that will be huge progress, but still wasteful because then we’ve just shifted the problem to a hardware bottleneck. I expect we’ll see something akin to gameboy cartridges that are cheap to produce and can plug into base models to augment specialization.

But I also wonder if anyone is pursuing some more insanely radical ideas, like reverting back to analog computing and leveraging voltage differentials in clever ways. It’s too big brain for me, but intuitively it feels like wasting entropy to reduce a voltage spike to 0 or 1.

throwaw12 · 2026-03-27T08:39:54 1774600794

> I love everything about this direction except for the insane inference costs.

If this direction holds true, ROI cost is cheaper.

Instead of employing 4 people (Customer Support, PM, Eng, Marketing), you will have 3-5 agents and the whole ticket flow might cost you ~20$

But I hope we won't go this far, because when things fail every customer will be impacted, because there will be no one who understands the system to fix it

michaelmior · 2026-03-27T11:08:14 1774609694

I worry about the costs from an energy and environmental impact perspective. I love that AI tools make me more productive, but I don't like the side effects.

azan_ · 2026-03-27T14:11:11 1774620671

Environmental impact of ai is greatly overstated. Average person will make bigger positive impact on environment by reducing his meat intake by 25% compared with combined giving up flying and AI use.

setsewerd · 2026-03-27T23:36:13 1774654573

Is this before or after you account for the initial training impact? Because that would need to be factored in for a good faith calculation here, much as the companies would rather we didn't.

efromvt · 2026-03-27T14:19:12 1774621152

Inference costs at least seem like the thing that is easiest to bring down, and there's plenty of demand to drive innovation. There's a lot less uncertainty here than with architectural/capability scaling. To your point, tomorrow's commodity hardware will solve this for the demands of today at some point in the future (though we'll probably have even more inference demand then).

eksu · 2026-03-27T06:31:05 1774593065

This is the wrong way to see it. If a technology gets cheaper, people will use more and more and more of it. If inference costs drop, you can throw way more reasoning tokens and a combination of many many agents to increase accuracy or creativity and such.

gf000 · 2026-03-27T10:00:27 1774605627

> throw way more reasoning tokens and a combination of many many agents to increase accuracy or creativity and such.

But this is just not true, otherwise companies that can already afford such high prices would have already outpaced their competitors.

spacebanana7 · 2026-03-27T11:10:50 1774609850

No company at the moment has enough money operate with 10x the reasoning tokens of their competitors because they're bottlenecked by GPU capacity (or other physical constraints). Maybe in lab experiments but not for generally available products.

And I sense you would have to throw orders of magnitude more tokens to get meaningfully better results (If anyone has access to experiments with GPT 5 class models geared up to use marginally more tokens with good results please call me out though).

gf000 · 2026-03-28T18:29:09 1774722549

Well, how many more dogs would you need to help you write your university thesis? It's a logical fallacy to assume that more tokens would somehow help - especially that even with cursory use you would see that LLMs, once they go off the road, they are pretty much lost, and the best thing you can do with them is to give them a clear context.

mastermage · 2026-03-27T07:07:57 1774595277

I mean theoretically if there are many competitiors the costs of the product should generally drop because competition.

Sadly enough I have not seen this happening in a long time.

chatmasta · 2026-03-27T04:28:46 1774585726

> The LLM is the key element here

No, the key (novel) element here is the two-tiered approach to sandboxing and inter-agent communication. That’s why he spends most of the post talking about it and only a few sentences on which models he selected.

chatmasta · 2026-03-27T04:26:09 1774585569

This sounds a lot cleaner than the approach I was thinking of with a separate bot for each role. I like it.

chatmasta · 2026-03-27T04:25:16 1774585516

Does IRC still have message length limits or was that only in the early versions of the protocol?

stackghost · 2026-03-27T04:40:50 1774586450

RFC 1459 originally stipulated that messages not exceed 512 bytes in length, inclusive of control characters, which meant the actual usable length for message text was less. When the protocol's evolution was re-formalized in 2000 via RFCs 2810-13 the 512-byte limit was kept.

However, most modern IRC implementations support a subset of the IRCv3 protocol extensions which allow up to 8192 bytes for "message tags", i.e. metadata and keep the 512-byte message length limit purely for historical and backwards-compatibility reasons for old clients that don't support the v3 extensions to the protocol.

So the answer, strictly speaking, is yes. IRC does still have message length limits, but practically speaking it's because there's a not-insignificant installed base of legacy clients that will shit their pants if the message lengths exceed that 512-byte limit, rather than anything inherent to the protocol itself.

entropie · 2026-03-27T04:37:35 1774586255

I guess you just send newlines as in multiple messages and disable flood protection on the server or whitelist your bot.

chatmasta · 2026-03-27T04:21:10 1774585270

I bet there’s gonna be a banger of a Mac Studio announced in June.

Apple really stumbled into making the perfect hardware for home inference machines. Does any hardware company come close to Apple in terms of unified memory and single machines for high throughput inference workloads? Or even any DIY build?

When it comes to the previous “pro workloads,” like video rendering or software compilation, you’ve always been able to build a PC that outperforms any Apple machine at the same price point. But inference is unique because its performance scales with high memory throughput, and you can’t assemble that by wiring together off the shelf parts in a consumer form factor.

It’s simply not possible to DIY a homelab inference server better than the M3+ for inference workloads, at anywhere close to its price point.

They are perfectly positioned to capitalize on the next few years of model architecture developments. No wonder they haven’t bothered working on their own foundation models… they can let the rest of the industry do their work for them, and by the time their Gemini licensing deal expires, they’ll have their pick of the best models to embed with their hardware.

whywhywhywhy · 2026-03-27T10:04:26 1774605866

> But inference is unique because its performance scales with high memory throughput, and you can’t assemble that by wiring together off the shelf parts in a consumer form factor.

Nvidia outperforms Mac significantly on diffusion inference and many other forms. It’s not as simple as the current Mac chips are entirely better for this.

rafram · 2026-03-27T10:34:18 1774607658

But where are you going to find an Nvidia GPU with 128+ GB of memory at an enthusiast-compatible price?

dabockster · 2026-03-27T15:10:45 1774624245

You don’t need it if you use llamacpp on Windows, or if you compile it on Linux with CUDA 13 and the correct kernel HMM support, and you’re only using MoE models (which, tbh, you should be doing anyways).

0x457 · 2026-03-27T17:04:34 1774631074

What MoE has to do with it? Aside from Flash-MoE that supports exactly one model and only on macOs - you still need to load entire model into memory. You also don't know what experts going to be activated, so it's not like you can predict which needs to be loaded.

zozbot234 · 2026-03-28T08:45:47 1774687547

With proper mmap support you don't really need the entire model in memory. It can be streamed from a fast SSD, and this is more useful for MoE models where not all expert-layers are uniformly used. Of course the more data you stream from SSD, the slower this is; caching stuff in RAM is still relevant to good performance.

0x457 · 2026-03-30T17:16:05 1774890965

Okay, yes, you don’t need the entire MoE model in memory for it to function.

But you still need the working set of frequently used experts to actually fit in RAM, or at least stay cached. Expert routing happens per token, per layer. If those weights aren’t resident, you’re effectively pulling them from disk on the critical path of generation — over and over again.

That’s not “just slower,” that’s order of magnitude slower. You’ll end up with constant page faults and page cache churn. And if swap is on the same device as the model, you’re now competing for bandwidth on top of that.

IMO the main benefit of mmap is ability to reclaim cold pages during high memory-pressure events when model isn't active.

gloxkiqcza · 2026-03-29T15:23:54 1774797834

You can do this on a Mac as well tho, right? So that 128 GB unified memory becomes cache for very fast 1+ TB Apple SSD.

zozbot234 · 2026-03-30T00:07:06 1774829226

I think the advantage of Flash-MoE compared to plain mmap is mostly the coalesced representation where a single expert-layer is represented by a single extent of sequential data. That could be introduced to existing binary formats like GGUF or HF - there is already a provision for differently structured representations, and that would easily fit.

sippeangelo · 2026-03-27T13:33:54 1774618434

Some Chinese sources sell modded Nvidia GPUs with extra VRAM. They're quite affordable in comparison to even a Mac Pro.

nextaccountic · 2026-03-27T14:31:58 1774621918

Any links to them? Never heard of this..

giobox · 2026-03-27T14:55:37 1774623337

It’s been going on for a while. Search YouTube or the web for 48gb 4090 (this is one of the most popular modded Nvidia cards), Nvidia of course never officially made a 4090 with this much memory.

There are some on sale via eBay right now. The memory controllers on some Nvidia gpus support well beyond the 16-24gb they shipped with as standard, and enterprising folks in China desolder the original memory chips and fit higher capacity ones.

noboostforyou · 2026-03-27T17:24:58 1774632298

I've seen a guy who sells modded 2080 Ti with 22gb for $500

https://www.tomshardware.com/pc-components/gpus/chinese-work...

There's also unreleased Nvidia engineering samples of cards with doubled VRAM like this - https://www.reddit.com/r/nvidia/comments/1rczghu/update_unre...

elorant · 2026-03-27T15:36:38 1774625798

Go at ebay and search for RTX 4090 48GBs. There's plenty of them with prices around $3.5k

giwook · 2026-03-27T13:49:06 1774619346

And how much do you trust Chinese hardware?

embedding-shape · 2026-03-27T13:59:12 1774619952

Give that most of mine, and probably yours, and probably most of the world's computers are in fact made in China one way or another, some higher percentage than others, I'm guessing most of us trust our hardware enough to continue using it.

giwook · 2026-03-28T16:13:44 1774714424

True. I was specifically referring to "modded Chinese hardware" from some unknown, unvetted third party versus say through a well-known brand that hopefully has its own rigorous QA and security processes in place.

x______________ · 2026-03-27T13:56:26 1774619786

When there's no one left to trust, maybe you need to re-evaluate your criteria.

sgc · 2026-03-27T15:48:09 1774626489

I wouldn't say that's true or even likely. It's completely possible to be in a pit of vipers where every single snake is venomous, and that is pretty much what we are seeing: With technological advances, there is a certain subset of people that will use them primarily to solidify their power and control over others. There is no utopian society right now whose government doesn't look to spy through technology, which of course is best set up at time of manufacture.

x______________ · 2026-03-27T17:02:48 1774630968

Agreed. Unless you have full control over the production chain to fully produce a device, you are subject to the whims and desires of those who preside over such technological feats that we take for granted in our daily lives.

To the original point, it's safe to say that highlighting a nationality with regards to trust is baseless and without merit, as would be for any other topic (men/women from x are y, z food is better here, etc..). Real life is much more complicated and nuanced past nationalities. Some might call it FUD (fear, uncertainty and doubt) but there's always a deeper rationale at the individual level as well.

sgc · 2026-03-27T17:24:40 1774632280

Rather than people being wary of Chinese in general, it's more that there is a high degree of government control exercised in China and they are known to be very strategic with long-term planning in regards to technology control both for spying and actual remote control of devices. We are all just looking for the least bad option. It's not like devices from other countries are immune, but they are often less organized so there is a better chance of avoiding the Chinese level of planned access.

It does seem like pretty low risk in this specific case so I agree OP's comment was bit over the top, but I would have no way to make anything resembling even an educated guess as to how far their programs go.

giwook · 2026-03-28T16:16:58 1774714618

Yes, this is really what I was referring to. And the fact that the original comment I was replying to mentioned "modded Chinese hardware" from some unspecified, unvetted 3rd party which doesn't exactly fill me with confidence.

whywhywhywhy · 2026-03-27T18:03:41 1774634621

The Mac is also chinese hardware

platevoltage · 2026-03-27T21:17:52 1774646272

It would be hilarious if you are using a Lenovo device right now.

giwook · 2026-03-28T16:14:15 1774714455

I mean it's pretty funny that probably 90% of the things in our homes are made in China.

estimator7292 · 2026-03-29T20:18:15 1774815495

Which of your devices weren't made in China?

krsw · 2026-03-28T19:08:55 1774724935

At this point I trust them more than US or Israeli tech

ricardobayes · 2026-03-27T11:44:47 1774611887

That might even be true, but how large is the TAM for such machines?

colechristensen · 2026-03-27T16:08:50 1774627730

The Nvidia DGX Spark is exactly this and in the same price and performance bracket.

andreybaskov · 2026-03-27T18:12:19 1774635139

Sadly, memory bandwidth is abysmal compared to Apple chips - 273 GB/s vs 614 GB/s on M5 Max for similar price. Even though fp4 compute is faster, it doesn't help for all the decode heavy agentic workflows.

angoragoats · 2026-03-27T14:45:38 1774622738

You can still buy used 3090 cards on ebay. 5 of them will give you 120GB of memory and will blow away any mac in terms of performance on LLM workloads. They have gone up in price lately and are now about $1100 each, but at one point they were $700-800 each.

rybosworld · 2026-03-27T14:57:18 1774623438

I don't see how 5x 3090's is a better option than an M3 Ultra Mac studio.

The mac will just work for models as large as 100B, can go higher with quantized models. And power draw will be 1/5th as much as the 3090 setup.

You can certainly daisy chain several 3090's together but it doesn't work seamlessly.

whywhywhywhy · 2026-03-27T18:04:37 1774634677

> You can certainly daisy chain several 3090's together

It's not "daisy chaining" 3090 has NVLink.

angoragoats · 2026-03-27T22:49:05 1774651745

FWIW I have never used NVLink, and I’m not sure why people are bringing up “daisy chaining” because as far as I’m aware that is not a thing with modern GPUs at all.

rybosworld · 2026-03-27T18:17:05 1774635425

Really? How would you NVLink more than 2 3090's?

angoragoats · 2026-03-27T15:19:51 1774624791

> The mac will just work for models as large as 100B, can go higher with quantized models. And power draw will be 1/5th as much as the 3090 setup.

This setup will work for 100B models as well. And yes, the Mac will draw less power, but the Nvidia machine will be many times faster. So depending on your specific Mac and your specific Nvidia setup, the performance per watt will be in the same ballpark. And higher absolute performance is certainly a nice perk.

> You can certainly daisy chain several 3090's together but it doesn't work seamlessly.

Citation needed; there's no "daisy chaining" in the setup I describe, and low level libraries like pytorch as well as higher level tools like Ollama all seamlessly support multiple GPUs.

lowbloodsugar · 2026-03-27T15:46:56 1774626416

How much does it cost to have an electrician wire up 240v circuit just to power the thing?

angoragoats · 2026-03-27T22:40:48 1774651248

The machine I’m describing works just fine on a dedicated 15A 120V circuit.

lowbloodsugar · 2026-03-28T01:12:08 1774660328

5x3090 in 1600W?

angoragoats · 2026-03-28T17:48:13 1774720093

1800W is the max on a 15A circuit, but yes, it’s usually under 1600W. For LLM inference, limiting the TDP to 225W or so per card saves a lot of power, for a 5% drop in performance.

rybosworld · 2026-03-27T15:49:59 1774626599

I think it's bad form to say "citation needed" when your original claim didn't include citations.

Regardless - there's a difference between training and inference. And pytorch doesn't magically make 5 gpus behave like 1 gpu.

angoragoats · 2026-03-27T22:47:39 1774651659

> I think it's bad form to say "citation needed" when your original claim didn't include citations.

I apologize, but using multiple GPUs for inference (without any sort of “daisy chaining”) is something that’s been supported in most LLM tooling for a long time.

> Regardless - there's a difference between training and inference.

No one brought up training vs. inference to my knowledge, besides you — I was assuming the machine was for inference, because my experience building a machine like the one I described was in order to do inference. If you want to train models, I know less about that, but I’m pretty sure the tooling does easily support multiple GPUs.

> And pytorch doesn't magically make 5 gpus behave like 1 gpu.

I never said it was magic, I just said it was supported, which it is.

edelans · 2026-03-27T13:07:16 1774616836

and let alone competing on the energy consumption!

embedding-shape · 2026-03-27T13:38:10 1774618690

Where are you gonna find Apple hardware with 128GB of memory at enthusiast-compatible price?

The cheapest Apple desktop with 128GB of memory shows up as costing $3499 for me, which isn't very "enthusiast-compatible", it's about 3x the minimum salary in my country!

kaashif · 2026-03-27T13:55:32 1774619732

Apple is not catering to minimum salaries in poor countries. Does this really need to be explained?

$3499 is definitely enthusiast compatible. That's beefy gaming PC tier, which is possibly the canonical example of an enthusiast market.

This isn't tens of thousands of dollars for top tier Nvidia chips we're talking about.

embedding-shape · 2026-03-27T13:57:52 1774619872

Seems I misunderstood what a "enthusiast" is, I thought it was about someone "excited about something" but seems the typical definition includes them having a lot of money too, my bad.

NikolaNovak · 2026-03-27T16:14:04 1774628044

I'm an immigrant to Canada, and yes, English has both literal meanings and colloquial meanings.

In the most literal meaning, absolutely, "Enthusiast" just means a person who likes something, is excited about something.

When it comes to market and products though, typically you'll see the word "Enthusiast" as mid-tier - something like: Consumer --> Enthusiast --> Professional (may have words like "Prosumer" in there as well etc:)

In that context, which is typically the one people will use when discussing product pricing and placement, "Enthusiast" is somebody who yes enjoys something, but does it sufficiently to be discerning and capable of purchasing mid-tier or above hardware.

So while a consumer photographer, may use their phone or compact or all-in-one camera, enthusiast photographer will probably spend $3000 - $5000 in camera gear. Equivalently, there are myriad gamers out there (on phones, consoles, Geforce Now, whatever:), an enthusiast gamer is assumed to have a dedicated gaming computer, probably a tower, with a dedicated video card, likely say a 5070ti or above, probably 32GB+ RAM, couple of SSDs which are not entry level, etc.

Again, this is not to say a person with limited budget is "not a real enthusiast", no gatekeeping is intended here; simply, if it may help, what the word means when it comes to market segmentation and product pricing :)

brailsafe · 2026-03-27T17:31:33 1774632693

Additionally, "enthusiasts"/"hobbyists" tend to be willing to spend beyond practical utility, while professionals are more interested in pragmatism, especially in photography from what I can tell.

If you're an actual pro, you need your stuff to work properly, efficiently, reliably, when it's called for. When you're a hobbyist, it's sometimes almost the goal to waste money and time on stuff that really doesn't matter beyond your interest in it; working on the thing is the point, not the value it generates. Pros should spend money on good tools and research and knowledge, but it usually needs to be an investment, sometimes crossing over with hobbyist opinions.

A friend of mine who's a computer hobbyist and retail IT tech, making far far less than I do, spends comically more than me on hardware to play basically one game. He keeps up to date with the latest processors and all that stuff, he knows hardware in terms of gaming. I meanwhile—despite having more money available—have a fairly budget gaming PC that I did build myself, but contains entirely old/used components, some of which he just needed to get rid of and gave me for free, and I upgrade my main mac every 5 years or something. I only upgrade when hardware is really getting in my way.

sib · 2026-03-27T19:19:38 1774639178

>> So while a consumer photographer, may use their phone or compact or all-in-one camera, enthusiast photographer will probably spend $3000 - $5000 in camera gear.

It's interesting that you chose photographers as the example here. In many cases that I've seen, enthusiast photographers spend much more than professional photographers on their gear because the photographers make their money with their gear and therefore need to justify it, while the enthusiasts are often tech people, successful doctors, etc., who spend lots and lots on money on their hobbies...

In any case, your point stands, that "enthusiast" computer users would easily spend $3-4K or more on gear to play games, train models, etc.

pchristensen · 2026-03-27T15:40:25 1774626025

$3.5k is a lot of money, but not a ton by American hobby standards. It's easy to spend multiples, even orders of magnitude more than that on hobbies like fishing, wine, sports tickets, concerts, scuba, travel, being a foodie, golf, marathons, collectibles, etc.

It's out of reach for lots of people, even in developed countries. But it's easily within reach for loads of people that care more about computing than other stuff.

chirau · 2026-03-27T16:09:12 1774627752

I live in America, I am very well compensated. Have been for 15 years now. $3500 is a lot of money. A lot. There is a tiny bubble of us tech folks who think it is accessible to most people. It is not. It is also the same reason Macs are still a niche. Don't take your circles to be the standard, it is very very far from it, especially if you think $3500 is not a lot of money.

It is easy to confirm this, just look at the sales number of these $3500 devices. It is definitely not an enthusiast price point, even in the US.

tracker1 · 2026-03-27T17:56:29 1774634189

It's not nothing for most people... it's more than a month of rent/mortgage for a significant number of Americans even. But if it's your primary hobby, it's not completely out of reach, and it's not something you necessarily spend every year. A lot of people will upgrade to a new computer every 3-5 years and maybe upgrade something in between those complete system upgrades.

I know plenty of people who don't make a lot of money (say top 25% or so) that will have a Boat or RV that costs more than a $3500 computer, and balk at the thought of spending that much on a computer. It just depends on where your interests are.

pchristensen · 2026-03-27T18:57:47 1774637867

The first words I said: "$3.5k is a lot of money..."

There are tens of millions of top 10% income adults in America. So something can be both unaffordable to most people, and also easily accessible to very many people.

1123581321 · 2026-03-27T17:21:40 1774632100

It’s a midrange to upper expense in the US if it’s your hobby. Most people don’t have a serious computer hobby but they golf, trade ATVs, travel, drink, etc.

foldr · 2026-03-29T11:12:50 1774782770

Mac has about 15% of the market share in the US. It's not really a niche.

$3500 is more than I would spend on a hobby too, but there are, in absolute terms, a large number of Americans who can spend this much on their hobbies.

sib · 2026-03-27T19:23:58 1774639438

There are something like 24 million millionaires in the United States... Estimates are that Americans spent $157 billion on pets in 2025.

There are a lot of people who could easily choose to spend $3,500 on a computer.

chirau · 2026-03-28T16:58:57 1774717137

There is no Apple device priced above $3k that has done 1 million in annual sales. The US population is >300M. <0.3% of the population. Don't take your bubble to be representative of society. $3500 is a lot of money, even in the US.

jltsiren · 2026-03-27T18:24:19 1774635859

$3500 would have been 3–4 months' discretionary spending as a PhD student in Finland 15 years ago. A sum you might choose to spend once a year on something you find genuinely interesting.

Some people succumb to lifestyle creep or choose it deliberately. Others choose to live below their means when their income grows. The latter have a lot more money to spend on extras, or to save if that's what they prefer.

oxfeed65261 · 2026-03-27T16:01:58 1774627318

In June 1977, the base Apple II model with 4 KB of RAM was $1,298 (equivalent to about $6,900 in 2025), and with the maximum 48 KB of RAM it was $2,638 (equivalent to about $14,000 in 2025).

(Source: Wikipedia via Claude Opus)

prewett · 2026-03-27T17:04:30 1774631070

Wow, 48k for $14000. Now you can get a MBP with a million times more memory for $3500 or so. Whereas that CPU was clocked at 1 MHz, so CPUs are only several thousand times faster, maybe something like 30,000 times faster if you can make use of multi-core.

brailsafe · 2026-03-27T17:19:33 1774631973

I'd argue that some of those are more consumption and activity than hobby depending on how they're engaged with, and that people use the word "hobby" too loosely, but would agree that Americans in-particular consume at obscene rates.

Golf equipment, mountaineering equipment, skiing and snowboarding lift tickets and gear, a single excessive graphics card that's only used for increasing frame rates marginally, or basically a single extra feature on a car, are all things that accumulate quite quickly. Some are clearly more superfluous than others and cater to whales, while some are just expensive by nature and aren't attempting to be anything else

ua709 · 2026-03-27T19:58:28 1774641508

Those are the prices for just buying equipment, which at least retain some kind of value. 3 million+ American kids are enrolled in competitive soccer with annual clubs dues between $1K and $5K, and that money is just gone at the end of the year. Basically none of those kids are going to have a career in soccer, so it's clearly a hobby, and everyone knows it. And soccer isn't even the most popular sport!

brailsafe · 2026-03-27T21:39:50 1774647590

Ya, I guess that's another category entirely. The cost of enrolling a kid in anything, potential travel involved etc..

darkwater · 2026-03-27T14:35:18 1774622118

An enthusiast in the hobby space is by definition someone willing to pour much more money that someone else not that enthusiast in whichever hobby we are talking about.

embedding-shape · 2026-03-27T14:49:16 1774622956

Well, and also has a bunch of money, not just willing. I guess locally we don't really have that difference, as two other commentators here went by, that's why I had to update my local understanding of "enthusiast". Usually we use it for how engaged/interested a person is, regardless of how much money they can or are willing to use.

Learned something new today at least, so that's cool :)

sgc · 2026-03-27T15:57:27 1774627047

Yes, when tech gear is sold as 'enthusiast' gear, it is almost invariably the most expensive non-professional tier of equipment. That is roughly the common understanding: Expensive and focused on features more than security required for public use; while remaining within reach of at least some individuals, not only corporations.

darkwater · 2026-03-27T15:43:04 1774626184

In a hobby where there are (strong) HW requirements, it mostly takes for granted you have money to shell out for your hobby, indeed.

Dylan16807 · 2026-03-27T18:32:27 1774636347

For an individual making median income in the US, it would cost 2% of your income to get a machine like this every 4-5 years. That's a matter of enthusiasm, not a matter of having a lot of money. Sorry that income is less where you are, but the people talking about the product tier are using American standards.

darkwater · 2026-03-27T14:36:10 1774622170

1200$ as the minimum salary covers probably 70% of Europe by population?

NetMageSCW · 2026-03-27T14:45:30 1774622730

The Neo has enough power to do small LLM testing and pretty much anything else a bit slowly, and costs $600?

0x457 · 2026-03-27T17:08:39 1774631319

Neo tops at 8GB RAM. What LLM are you going to run there? Functiongemma?

It can absolutely do some ML inference on it, but not much in terms of LLMs.

darkwater · 2026-03-28T08:11:15 1774685475

Maybe, but that does not mean that the Mac Studio is not very expensive hardware even for rich first world countries.

monsieurbanana · 2026-03-27T14:21:19 1774621279

Did you need to add poor? Unless apple isn't catering to the US

joe_mamba · 2026-03-27T13:40:35 1774618835

> it's about 3x the minimum salary in my country!

Enthusiast compute hardware doesn't cater to the people on the minimum salary in any country, let alone developing nations. When Ferrari makes a car they don't ask themselves if people on minimum salary will be able to afford them.

In in the bottom two poorest EU member states and Apple and Microsoft Xbox don't even bother to have a direct to customer store presence here, you buy them from third party retailers.

Why? Probably because their metrics show people here are too poor to afford their products en-masse to be worth operating a dedicated sales entity. Even though plenty of people do own top of the line Macbooks here, it's just the wealthy enthusiast niche, but it's still a niche for the volumes they (wish to)operate at. Why do you think Apple launched the Mac Neo?

embedding-shape · 2026-03-27T13:46:53 1774619213

Right, I think maybe we're then talking about "upper class enthusiasts" or something in reality then? I understood that to juts be about the person, not what economic class they were in, maybe I misunderstood.

joe_mamba · 2026-03-27T14:39:06 1774622346

>Right, I think maybe we're then talking about "upper class enthusiasts" or something in reality then?

Why? Enthusiasts are by definition people for whom value for money is not the main driver but top performance and cutting edge novelty at any cost. Affording enthusiast computer hardware is not a human right same how affording a Lamborghini or McMansion isn't.

But you don't need to buy a Lamborghini to do your grocery shopping or drive your kids to school, same how you don't need an Nvidia 5090 or MacBook Pro Max to do your taxes or do your school work.

So the definition is fine as it is. It's hardware for people with very deep pockets, often called whales.

Heliosmaster · 2026-03-27T14:25:02 1774621502

Yes, it's a different definition.

Enthusiast in this contest more or less means you are excited enough about something to get a level above what normal people should get and just below professional pricing. An enthusiast camera body can be 2000 euros.

I would say an enthusiast computer is 2-4k.

It really depends what you meant with minimum salary (yearly?) because paying 3 months of salary for a computer like that isn't far fetched. You're not using this to generate recipes for cookies. An enthusiast level car is expensive as well.

0x457 · 2026-03-27T17:11:45 1774631505

enthusiasts in computer hardware assumes enthusiasm about hardware, not about "hardware on an budget". It doesn't matter if it's afforable or not.

tracker1 · 2026-03-27T17:49:40 1774633780

I spent aaround that on my current personal desktop... 9950X, 2x48gb ddr5/@6000, RX 9070XT, 4tb gen 5 nvme + 4tb gen 4 nvme. I could have cut the cpu to a 9800x3d and ram to 32gb with a different GPU if my needs/usage were different. I'm running in Linux and don't game too much.

That said, a higher end gaming setup is going to cost that much and is absolutely in the enthusiast realm. "enthusiast" doesn't mean compatible with "minimum wage"

mprovost · 2026-03-27T14:47:13 1774622833

The original Mac with 128KB of memory cost $2,495 when Apple released it in 1984. It would be about 3x that in today's money.

intrasight · 2026-03-27T18:06:56 1774634816

I came here to say the same. Even with my student discount price of $1000, that's over 3K in today's dollars.

We are so freaking spoiled by the cheap cost of compute now.

jiwidi · 2026-03-27T14:20:24 1774621224

tell me what pc with an nvidia gpu can you buy with same memory and performance.

I never liked apple hardware, but they are now untouchable since their shift to own sillicon for home hardware.

angoragoats · 2026-03-27T14:43:21 1774622601

This has changed since Sam Altman started buying up all the chip supply, raising prices on memory, storage, and GPUs for everyone, but it used to be the case that you could build a PC that was both cheaper and faster than a Mac for LLM inference, with roughly equal performance per watt.

You would use multiple *90-series GPUs, throttled down in terms of power. Depending on the GPU, the sweet spot is between 225-350W, where for LLM workloads you only lose 5-10% of performance for a ~50% drop in power consumption.

Combined with a workstation (Xeon/Epyc) CPU with lots of PCIe, you can support 6-7 such GPUs (or more, depending on available power). This will blow away the fastest Mac studio, at a comparable performance per watt.

Again, a lot of this has changed, since GPUs and memory are so much more expensive now.

Macs are great for a simpler all in one box with high memory bandwidth and middling-to-decent GPU performance, but they are (or were) absolutely not "untouchable."

Detrytus · 2026-03-27T14:49:40 1774622980

With 6-7 GPUs and EPYC cpu it will also cost 2-3x more than a Mac Studio.

deaddodo · 2026-03-27T15:15:37 1774624537

I think OP’s point was that it would do more than 2-3x the workload, thus them stating “blow it out of the water” and specifying “performance-per-watt”.

elorant · 2026-03-27T15:33:35 1774625615

Untouchable my ass. You get a PC that has an ssd glued to the motherboard so if you run write intensive workloads and that thing wears out replacing it will have significant cost. Then there’s no PCie slot to get any decent network card if you want to work more than one of them in unison, you’re stuck with that stupid thunderbolt 5 while Infiniband gives x10 network speeds. As for memory bandwidth, it’s fast compared to CPUs but any enterprise GPU dwarfs it significantly. The unified RAM is the only interesting angle.

Apple could have taken a chunk of the enterprise market now with that AI craze if they had made an upgradable and expandable server edition based on their silicon. But no, everything has to be bolt down and restricted.

traceroute66 · 2026-03-27T14:37:06 1774622226

> tell me what pc with an nvidia gpu can you buy with same memory and performance.

And power consumption !

The performance per watt of Apple is unmatched.

dabockster · 2026-03-27T15:08:43 1774624123

This needs to be sold as the big ticket item for low level devs. Their chips are some of the most power efficient chips on the market right now.

Hoping they release a blade server version somehow.

bigyabai · 2026-03-27T15:30:06 1774625406

Nvidia's recent GPUs are more power-efficient than Apple Silicon in raster, training and inference workloads.

A blade server would get cancelled just like the Mac Pro for exactly the same reasons: https://9to5mac.com/2026/03/02/some-apple-ai-servers-are-rep...

traceroute66 · 2026-03-27T15:45:55 1774626355

> Nvidia's recent GPUs are more power-efficient than Apple Silicon in raster, training and inference workloads.

I think you can do better than the proverbial Apples and Oranges comparison.

In terms of total system, "box on desk", Apple is likely to remain the performance per watt leader compared to random PC workstations with whatever GPUs you put inside.

bigyabai · 2026-03-27T18:39:50 1774636790

Then ignore me, and go ask your local datacenter why Apple Silicon isn't on any of their racks.

Melatonic · 2026-03-27T16:28:38 1774628918

Apple releasing anything enterprise or "server" related would be a pretty big pivot - let alone blades.

saltyoldman · 2026-03-27T15:43:11 1774626191

I've owned some beefy computers in the past and this tiny little m4 mini on my desk blows them all out of the water easily. It's crazy.

chpatrick · 2026-03-27T10:41:59 1774608119

But they're pretty fast and can have loads of RAM, which would be prohibitively expensive with Nvidia.

chocochunks · 2026-03-27T10:49:08 1774608548

A 128GB 2TB Dell Pro Max with Nvidia GB10 is about $4200, a Mac Studio with 128GB RAM and 2TB storage is $4100. So pretty comparable. I think Dell's pricing has been rocked more by the RAM shortage too.

adgjlsfhk1 · 2026-03-27T14:57:42 1774623462

Unfortunately the GB10 is incredibly bandwith starved. You get 128gb ram, but only 270GB/s bandwidth. The M3 Ultra mac studio gets you 820GB/s. (The M4 max is at 410GB/s. I'm not aware of any workload that gets the GB10 to it's theoretical peakflops.

chocochunks · 2026-03-27T15:53:16 1774626796

You can't get a 128GB M3 Ultra, it's also more expensive. For some workloads the Studio is better, for others the GB10.

midnight_eclair · 2026-03-27T12:15:15 1774613715

~not unified memory tho~

mciancia · 2026-03-27T12:23:47 1774614227

It is unified memory on this one

cestith · 2026-03-27T13:49:37 1774619377

From the spec sheets I’m looking at, it is not. I’m seeing models of the Dell Pro Max with 128 GB of DDR5-6400 as CAMM2, then a separate memory of up to 24 GB on the GPU. CAMM2 does not make the memory unified.

There are also SO-DIMM options.

chocochunks · 2026-03-27T14:02:37 1774620157

You're not looking at the right thing. Dell's naming is horrible. Dell Pro Max with GB10 (https://www.dell.com/en-us/shop/cty/pdp/spd/dell-pro-max-fcm...). It's a very different computer than what you're looking at and has 128GB LPDDR5X unified memory.

cestith · 2026-03-27T16:05:05 1774627505

Thanks for pointing that out. I found a more informative article about that model at https://www.mcpgov.com/dell-pro-max-gb10

midnight_eclair · 2026-03-27T12:31:22 1774614682

my bad

ctxc · 2026-03-27T12:59:34 1774616374

I took ~ to be a "singing tone" for some reason till I saw sibling and realized it might be an attempted strikethrough xD

benoau · 2026-03-27T14:00:08 1774620008

That won't hold much benefit as SOCAMM2 and LPCAMM2 get more popular.

traceroute66 · 2026-03-27T14:52:58 1774623178

> So pretty comparable.

The Mac Studio almost certainly uses at least half the power

(educated guess, I'm too lazy to go look at all the spec sheets and run the numbers)

bigyabai · 2026-03-27T15:33:32 1774625612

It's actually reversed. The GB10 chipset has a TDP of 140w, whereas M2/M3 Ultra pulls over 250w from the wall: https://support.apple.com/en-us/102027

traceroute66 · 2026-03-27T15:41:04 1774626064

> It's actually reversed. The GB10 chipset has a TDP of 140w, whereas M2/M3 Ultra pulls over 250w from the wall

Come on mate ... I think you and I both know I was talking about complete system here, not discrete components.

I'm pretty sure your total package (Dell Pro Max + GB10) will pull more from the wall.

bigyabai · 2026-03-27T15:43:33 1774626213

I'm pretty sure you need to look up what you're talking about instead of making a guess.

The Dell Pro Max PSU + enclosure is only rated for 240w, it literally can't pull more than 250w from the wall without shorting itself.

traceroute66 · 2026-03-27T15:51:47 1774626707

> 240w

280w according to the spec sheet I just looked at.

Also just look at the graphs on Geerling's website. The Mac Studio eats the Dell for breakfast in a number of the tests: https://www.jeffgeerling.com/blog/2025/dells-version-dgx-spa...

plagiarist · 2026-03-27T13:33:30 1774618410

Not quite, what is the vRAM bandwidth of each? The bandwidth is a huge contributor to LLM performance.

embedding-shape · 2026-03-27T13:45:50 1774619150

AFAIK, for the unified bandwidth, it depends mostly on the CPU, for M4 Max (I think it's the default today?) it does ~550 GB/s, while GB10 does ~270 GB/s, so about a 2x difference between the two. For comparison, RTX Pro 6000 does 1.8 TB/s, pretty much the same as what a 5090 does, which is probably the fastest/best GPUs a prosumer reasonable could get.

plagiarist · 2026-03-27T23:00:23 1774652423

Granted, it won't be competitive against the flagship dGPUs. Nevertheless, that ~2x is a pretty huge difference in similarly priced offerings.

wappieslurkz · 2026-03-27T13:57:24 1774619844

Do NVIDIA solutions also outperform the Apple M-series in performance per Watt?

whywhywhywhy · 2026-03-27T18:06:14 1774634774

No, that's why Apple uses Performance Per Watt not actual performance celling as the metric. In actual workloads where you'd need this power then actual performance is what matters not PPW.

Lalabadie · 2026-03-27T14:34:48 1774622088

Probably comparable, but that's only with business-grade products, it's why Apple's current silicon is so remarkable on the market at the consumer level.

wappieslurkz · 2026-03-27T14:42:49 1774622569

Thanks.

AdamN · 2026-03-27T10:44:11 1774608251

Nvidia isn't selling one-off home computers afaik. But yes in terms of datacenter cloud usage Nvidia performs.

newsclues · 2026-03-27T11:02:10 1774609330

https://marketplace.nvidia.com/en-us/enterprise/personal-ai-...

jamespo · 2026-03-27T11:36:38 1774611398

Amusingly there's a macbook next to it in the pic, is this headless?

Tsiklon · 2026-03-27T11:45:26 1774611926

It has a HDMI port and its USB-C ports also support display out. But I believe most who buy it intend to use it headless. The machine runs Ubuntu 24.04 and has a slightly customised Gnome (green accents and an nvidia logo in GDM) as its desktop.

_zoltan_ · 2026-03-27T12:31:53 1774614713

GB300 DGX Station was announced last Monday.

eitally · 2026-03-27T15:38:29 1774625909

It's going to cost far more than a diy machine with multiple lower end GPUs. Which is fine -- it's aimed at enterprise, not home labs.

HerbManic · 2026-03-27T04:53:10 1774587190

Jeff Geerling doing that 1.5TB cluster using 4 Mac Studios was pretty much all the proof needed to demo how the Mac Pro is struggling to find any place any more.

https://www.jeffgeerling.com/blog/2025/15-tb-vram-on-mac-stu...

pjmlp · 2026-03-27T08:21:05 1774599665

That is the proof what is left is a workaround, just like pilling minis on racks because Apple left the server space.

Also why Swift nowadays has to have good Linux support, if app developers want to share code with the server.

coldtea · 2026-03-27T11:33:17 1774611197

A workaround that works is better than an official solution that's barely adequate. Which is often the case.

pjmlp · 2026-03-27T12:39:24 1774615164

Or just maybe, to use a Steve Jobs quote, one is holding it wrong and should look elsewhere.

coldtea · 2026-03-27T21:28:22 1774646902

People sneer at this Steve Jobs quote, but almost anybody working in tech had at some point quoted another, stronger, quote like "We tried to make the program idiot proof, but they keep making better idiots".

There's also: "Programming today is a race between software engineers striving to build bigger and better idiot-proof programs, and the Universe trying to produce bigger and better idiots. So far, the Universe is winning."

https://en.wikiquote.org/wiki/Rick_Cook

zozbot234 · 2026-03-27T05:02:49 1774587769

But those Thunderbolt links are slower than modern PCIe. If there's actually a M5-based Mac Studio with the same Thunderbolt support, you'll be better off e.g. for LLM inference, streaming read-only model weights from storage as we've seen with recent experiments than pushing the same amount of data via Thunderbolt. It's only if you want to go beyond local memory constraints (e.g. larger contexts) that the Thunderbolt link becomes useful.

wpm · 2026-03-27T05:09:49 1774588189

Why everyone wants to live in dongle/external cabling/dock hell is beyond me. PCIe cards are powered internally with no extra cables. They are secure. They do not move or fall off of shit. They do not require cable management or external power supplies. They do not have to talk to the CPU through a stupid USB hub or a Thunderbolt dock. Crappy USB HDMI capture on my Mac led me to running a fucking PC with slots to capture video off of a 50 foot HDMI cable, that then streamed the feed to my Mac from NDI, because it was more reliable than the elgarbo capture dongle I was using. This shit is bad. It sucks. It's twice the price and half the quality of a Blackmagic Design capture card. But, no slots, so I guess I can go get fucked.

wtallis · 2026-03-27T06:06:23 1774591583

For anything that's even somewhat in the consumer space rather than pure workstation/professional, the main reason is that dongles can be used with a laptop but add-in cards can't. When ordinary consumer PCs (or even office PCs) are in the picture, laptops are a huge chunk of the target audience.

The market segments that can afford to ignore laptops and only target permanently-installed desktops are mostly those niches where the desktop is installed alongside some other piece of equipment that is much more expensive.

GeekyBear · 2026-03-27T06:04:49 1774591489

Wasn't streaming models from storage into limited memory a case where it was impressive that you could make the elephant dance at all?

If you want to get usable speeds from very large models that haven't been quantitized to death on local machines, RDMA over Thunderbolt enables that use case.

Consumer PC GPUs don't have enough RAM, enterprise GPUs that can handle the load very well are obscenely expensive, Strix Halo tops out at 128 Gigs of RAM and is limited on Thunderbolt ports.

zozbot234 · 2026-03-27T08:18:28 1774599508

The bad performance you saw was with very limited memory and very large models, so streaming weights from storage was a huge bottleneck. If you gradually increase RAM, more and more of the weights are cached and the speed improves quite a bit, at least until you're running huge contexts and most of the RAM ends up being devoted to that. Is the overall speed "usable"? That's highly subjective, but with local inference it's convenient to run 24x7 and rely on non-interactive use. Of course scaling out via RDMA on Thunderbolt is still there as an option, it's just not the first approach you'd try.

Dylan16807 · 2026-03-27T18:56:52 1774637812

> If you gradually increase RAM, more and more of the weights are cached and the speed improves quite a bit

It'll increase a lot based on the zero-ram baseline. But it's still complete garbage compared to fitting the model in RAM. Even if you fit most of it in RAM you're still probably an order of magnitude slower than fitting all of it in RAM, most of your time spent waiting for your SSD.

GeekyBear · 2026-03-27T16:03:20 1774627400

If you don't care about performance, you have a lot of options.

mixdup · 2026-03-27T12:25:47 1774614347

The proposition of a Mac Pro in the Apple Silicon world wasn't necessarily about performance, it was about the existence of the PCIe slots. I don't think AI becoming a workload for pro Macs means the Mac Pro doesn't have a place, people who were using Mac Pros for audio or video capture didn't stop doing that media work and switched to AI as a profession. That market just wasn't big enough to sustain the Mac Pro in the first place and Apple has finally acknowledged that fact

alsetmusic · 2026-03-27T13:23:41 1774617821

I had a U-Audio PCI card in a Mac Pro during the Intel era of Macs. It was a chip to run their software plugins and the plugins are top of the line. I have a U-Audio box that runs over Thunderbolt now. I know there are people who need device slots, but it's vanishingly few. I'm disappointed that this category of machine is going away, but it stopped being for me in the Apple Silicon era.

grahamlee · 2026-03-27T12:29:35 1774614575

so many peripherals now come in external boxes that communicate _incredibly quickly_ over Thunderbolt 4/5 that the need for PCIe is marginal, while the cost to support it is significant.

ActorNightly · 2026-03-27T17:45:26 1774633526

Wow spend 40k to get the same tokens/second in QWEN as you would on a 3090

I have a feeling that Mac fans obsess more about being able to run large models at unusably slow speeds instead of actually using said models for anything.

dragonwriter · 2026-03-27T10:22:41 1774606961

> Apple really stumbled into making the perfect hardware for home inference machines

For LLMs. For inference with other kinds of models where the amount of compute needed relative to the amount of data transfer needed is higher, Apple is less ideal and systems worh lower memory bandwidth but more FLOPS shine. And if things like Google’s TurboQuant work out for efficient kv-cache quantization, Apple could lose a lot of that edge for LLM inference, too, since that would reduce the amount of data shuffling relative to compute for LLM inference.

NetMageSCW · 2026-03-27T14:49:23 1774622963

Or just mean that you could run a 5x bigger model on Apple than before.

dragonwriter · 2026-03-27T15:10:23 1774624223

Well, since its kv-cache that TurboQuant optimizes, it means five times bigger context fits into RAM, all other things being equal, not a five times bigger model. But, sure, with any given context size and the same RAM available, you can instead fit a bigger model—which also takes more compute to get the same performance.

Anything that increases the necessary compute to fully utilize RAM bandwidth in optimal LLM serving weakens Apples advantage for that.

robotswantdata · 2026-03-27T07:24:42 1774596282

DGX workstations, expensive but allow PCI cards as well.

https://marketplace.nvidia.com/en-us/enterprise/personal-ai-...

fooker · 2026-03-27T09:42:12 1774604532

It's hilarious that not a single one of these has pricing listed anywhere public.

I don't think they expect anyone to actually buy these.

Most companies looking to buy these for developers would ideally have multiple people share one machine and that sort of an arrangement works much more naturally with a managed cloud machine instead of the tower format presented here.

Confirming my hypothesis, this category of devices more or less absent in the used market. The only DGX workstation on ebay has a GPU from 2017, several generations ago.

chatmasta · 2026-03-27T11:23:10 1774610590

Nvidia doesn’t list prices because they don’t sell the machines themselves. If you click through each of those links, the prices are listed on the distributor’s website. For example the Dell Pro Max with GB10 is $4,194.34 and you can even click “Add to Cart.”

fooker · 2026-03-27T13:32:12 1774618332

I don't mean the small GB10s.

If you try to find the pricing of the GB300 towers even on the manufacturer sites, you'll see that it's not listed for any of the six or so models.

tecleandor · 2026-03-27T14:09:14 1774620554

Because that's a different price point, that's getting near 100K, and the availability is very limited. I don't think they're even selling it openly, just to a bunch of partners...

The MSI workstation is the one that is showing some pricing around. Seems like some distributors are quoting USD96K, and have a wait time of 4 to 6 weeks [0]. Other say 90K and also out of stock [1]

--

  0: https://www.cdw.com/product/msi-nvidia-gb300-wkstn-72c-grace-cpu/9087313?pfm=srh
  1: https://www.centralcomputer.com/msi-ct60-s8060-nvidia-dgx-station-cpu-memory-up-to-496gb-lpddr5x-nvidia-blackwell-ultra-gpu-1x-10-gbe-2x-400-gbe.html

fooker · 2026-03-28T02:53:04 1774666384

> I don't think they're even selling it openly, just to a bunch of partners...

Yes, that's my point.

Melatonic · 2026-03-27T16:35:01 1774629301

Isnt that because nobody has released one yet? They are brand new

numpad0 · 2026-03-27T10:46:57 1774608417

I don't think it's so odd, very few products above ~$50k have final prices listed for anyone to buy 1-click.

fooker · 2026-03-27T13:33:34 1774618414

Workstations above 50k are not that uncommon.

Older xeon based workstations easily reach that number.

tecleandor · 2026-03-27T14:12:50 1774620770

If you put a 50 or 80K workstation in the HP store, it will say:

"Purchasing limit reached. To complete your order and provide you with the best customer experience, please call 1-877-888-8235"

bluedino · 2026-03-27T13:42:03 1774618923

'Important' people in organizations get them. They either ask for them, or the team that manages the shared GPU resources gets tired of their shit and they just give them one.

fooker · 2026-03-27T14:19:36 1774621176

Yes, I agree this is the use case.

Since the user here is not paying for it directly, the manufacturer does not have any incentive to list prices anywhere.

deelowe · 2026-03-27T12:31:37 1774614697

There were plenty of them around when I worked at Nvidia. They definitely exist.

fooker · 2026-03-27T13:39:31 1774618771

You have seen plenty of third party GB300 DGX workstations?

QuantumNomad_ · 2026-03-27T08:07:19 1774598839

How much do those workstations cost? All of the different manufacturers links on that page lack pricing info and you have to contact them for pricing.

fotcorn · 2026-03-27T09:05:06 1774602306

Cheapest i know if is around $96k

cudima · 2026-03-27T08:46:17 1774601177

$4000

eitally · 2026-03-27T15:41:13 1774626073

$4k is for GB10 (DGX Spark reference design). $90-100k is for GB300 (DGX Station reference design).

alerighi · 2026-03-27T15:26:26 1774625186

To me there is a fundamental difference. Even if PC hardware costs slightly more (now because of the RAM situation, Apple producing his chips in house can get better deals of course), it's something that is worth more investing in in.

Maybe you spend 1000$ more for a PC of comparable performance, well tomorrow you need more power, change or add another GPU, add more RAM, add another SSD. A workstation you can keep upgrade it for years, adding a small cost for an upgrade in performance.

An Apple machine is basically throw away: no component inside can be upgraded, you need more RAM? Throw it away and buy a new one. You want a new GPU technology? You have to change the whole thing. And if something inside breaks? You of course throw away the whole computer since everything is soldered on the mainboard.

There is then the software issue, with Apple devices you are forced to use macOS that kind of sucks, especially for a server usage. True nowadays you can install Linux on it, but the GPU it's not that well supported, thus you loose all the benefits. You have to stuck with an OS that sucks, while in the PC market you have plenty of OS choices, Windows, a million of Linux distributions, etc. If I need a workstation to train LLM why do I care about a OS with a GUI? It's only a waste of resources, I just need a thing that runs Linux and I can SSH into it. Also I don't get the benefit of using containers, Docker, etc.

Mac suck even hardware side form a server point of view, for example it's not possible to rack mount them, it's not possible to have redundant PSU, key don't offer remote KVM capability, etc.

orangecat · 2026-03-27T15:57:14 1774627034

you need more RAM? Throw it away and buy a new one.

Or sell it, which is much easier to do with Macs because they're known quantities and not "Acer Onyx X321 Q-series Ultra".

There is then the software issue, with Apple devices you are forced to use macOS that kind of sucks, especially for a server usage

That's a fair point. Apple would get a ton of goodwill if they released enough documentation to let Asahi keep up with new hardware. I can't imagine it would harm their ecosystem; the people who would actually run Linux are either not using Macs at all, or users like me who treat them as Unix workstations and ignore their lock-in attempts.

mjburgess · 2026-03-27T15:34:54 1774625694

"Upgrades" havent been a thing for nearly a decade. By the time you want to upgrade a machine part (c. 5yr+ for modern machines), you'd want to upgrade every thing, and its cheap to do so.

It isnt 2005 any more where RAM/CPU/etc. progress benefits from upgrading every 6mo. It's closer to 6yr to really notice

cesarb · 2026-03-27T15:56:16 1774626976

> By the time you want to upgrade a machine part (c. 5yr+ for modern machines), you'd want to upgrade every thing,

That's only the case for CPU/MB/RAM, because the interfaces are tightly coupled (you want to upgrade your CPU, but the new one uses an AM5 socket so you need to upgrade the motherboard, which only works with DDR5 so you need to upgrade your RAM). For other parts, a "Ship of Theseus" approach is often worth it: you don't need to replace your 2TB NVMe M.2 storage just because you wanted a faster CPU, you can keep the same GPU since it's all PCIe, and the SATA DVD drive you've carried over since the early 2000s still works the same.

secabeen · 2026-03-27T16:20:50 1774628450

Even this is understating it; if you buy at the right point in the cycle, you can Ship-of-Theseus quite a while. An AM4 motherboard released in Feb 2017 with a Ryzen 1600X CPU, DDR4 memory and a GTX780 Ti would be a obsolete system by today's standards. Yet, that AM4 motherboard can be upgraded to run a Ryzen 5800X3D CPU, the same (or faster) DDR4 memory, and a RTX 5070Ti GPU and be very competitive with mid-tier 2026 systems containing all new components. Throughout all this, the case, PSU, cooling solution, storage could all be maintained, and only replaced when individual components fail.

I expect many users would be happy with the above final state through 2030, when the AM6 socket releases. That would be 13 years of service for that original motherboard, memory, case and ancillary components. This is an extreme case, you have to time the initial purchase perfectly, but it is possible.

bigyabai · 2026-03-27T15:36:22 1774625782

That's news to me. I see Mac Minis with external drives plugged-in constantly; I bet those people would appreciate user-servicable storage. I doubt they bought an external drive because they wanted to throw away the whole computer.

dghlsakjg · 2026-03-27T15:45:26 1774626326

Mac minis have user serviceable storage: https://store.m4-ssd.com/products/third-party-ssd-for-mac-mi...