Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Nvidia might do for desktop AI what it did for desktop gaming (theangle.com)
60 points by walterbell on Jan 16, 2025 | hide | past | favorite | 56 comments


400B parameters in a "plug-and-play" form factor for $6k is wild.

Meanwhile all the usual "desktop" players are still trying to find a way to make good on their promises to develop their own competitive chips for AI inference and training workloads in the cloud.

I'm betting on Nvidia to continue to outperform them. The talent, culture and capabilities gap just feels insurmountable for the next decade at least barring major fumbles from Nvidia.


Even 200B for 3k USD is really good, 128 GB of memory for that price is surprising!

Right now using a two RTX 2080 setup for a pilot project, where it runs ollama and qwen2.5-coder (14B quantized version), a more serious step up from there on a budget would realistically be an RX 7900 XTX (24 GB, though I've had issues with setting up ROCm) for 1k EUR or RTX A5000 (24 GB) for 2.5k EUR in the current market.

Honestly, I don't even need the best performance for what I'm trying to do, even an Arc A770 (16 GB) for 350 EUR would be enough to iterate, except that it's not actually supported by ollama and lots of stuff out there: https://github.com/ollama/ollama/blob/main/docs/gpu.md (I know ollama isn't the only solution, but it sucks when the tools you like aren't available)


What are you doing that requires local over calling an API provider? If you're developing an AI app that makes lots of calls and is designed for a local GPU it makes some sense perhaps?


On prem alternatives to ChatGPT and the likes of GitHub Copilot, where you cannot legally send data to cloud services.

Underneath everything, ollama can do a lot of the heavy lifting, but it still needs to run somewhere. For decent chat models you probably want a server with either one beefy GPU or a few regular consumer ones (it actually seems to split the load just fine, at least when it comes to Nvidia hardware), whereas for smaller models like autocomplete (where 3B or 1.5B models are enough) you can choose whether to use the same server or run locally.


If performance isn't an issue, why not get a Mac mini?


A really good suggestion, I actually used a MacBook Air with an M1 in the first stages and it was fine for prototyping too!

Though the local market here is a bit bad. There's a Mac Mini with an M2 Pro and 16 GB for 1.9k EUR, so more expensive than just two GPUs. I'm guessing that the local sellers are trying to profit quite a bit, because on Apple's site, the M4 16 GB version starts at 600 USD (and for 24 GB it's 800 USD and for 32 GB it's 1000 USD). That actually makes it a good option!


nvidia could just release variations of their "gaming" cards with more RAM. There is absolutely nothing stopping them from releasing 64GB or 128GB 5080s and 5090s.

But they don't because that would cannibalize the extremely overpriced enterprise offerings. The #1 reason people are forced into the tens of thousands of dollar cards is memory needs.

So considering that, ask what niche this device really fills: Is it a new "supercomputer" for the home? Not really, given that it is silicon and memory bandwidth restricted so much that their $600 GPU can beat it soundly on every metric (not surprising when you look at the power and airflow/cooling needs of "real" GPUs) but in scenarios requiring large memory. But while this can hoist those larger models, it is going to be far removed from state of the art.

It's neat, but the market for this is being grossly overstated on a lot of these hype advertorials. The large models you'll run on it will be quantized the point of absurdity, not to mention that for 99.9%+ of users, anything short of state of the art is basically useless.

It's a neat eGPU of sorts for a Mac or something (they really hype the fact that you use CUDA for this). Still really trying to figure out what value it possibly brings outside of trying to lure a bunch of enthusiasts to blow money on this so they can fiddle with Llama for a week and then realize it's a waste of time.


Would it really cannibalize the enterprise offering? The whole point of that NVIDIA Enterprise license is that the their TOS forbids using anything else in datacenters.


Is there any info about how many tokens a second you would get with a 400b model? Without that it's like claiming a graphics card can output at 8k, but neglecting to state the frames per second. Suspect, in other words.

Outputs at 8k resolution! (+)

...at 5 fps (+)


I saw somewhere it's 0.5T/s of memory bandwidth. So you can read the whole memory (128G) 4 times per second. So if you fill it up with model you get 4t/s give or take.

For comparison H100 can read its memory 40 times per second, so if you use it all you can get around 40t/s.

Of course in either case you don't have to fill it up, but instead use smaller model or more GPUs.


I've seen similar claims, with nothing in particular to support it. Hopefully Nvidia clarifies this soon.

AMD Strix Halo has a 128GB config with 256GB/sec.

Apple has a MBP m4 max with 128GB ram for $4,700 with a 546GB/sec interface.

Hopefully Nvidia can do better.


What special sauce do they have to run 400B models fast exactly?


LPDDR5x.

Which is way slower than GDDR6x or GDDR7 let alone HBM. I don't expect these machines to be anywhere near as fast as the hype.

256-bit LPDDR5X is impressive, don't get me wrong. But it's impressive for a CPU platform. It's actually pretty bad for a GPU.


Exactly. So where does the "supercomputer" concept comes from exactly?


From the article:

> Huang also revealed ‘Project Digits,’ a new product based on its Grace Blackwell AI-specific architecture that aims to offer at-home AI processing capable of running 200 billion-parameter models locally for a projected retail cost of around $3,000.

> There are many exciting things about Project Digits, including the fact that two can be paired to offer 405 billion-parameter model support for ‘just’ $6,000

My experience with running local LLMs is quite limited, but most tools can split the workload between GPUs (or more commonly GPU+CPU) with minimal fuss. It parallelizes fairly well. There may not be any actual secret sauce beyond just having the necessary gobs and gobs of fast memory to load the model into.


I don't see any special sauce mentioned to make it run these models fast. If you get a token per second this thing is useless


Quantization and hype.


I don’t get the need. With gaming there’s a real benefit to having the card close to a display. There’s enough benefit that you don’t mind it being unused 20 hours a day. There’s relatively little benefit to having training happen a few feet away rather than a data center. Solid chance it sits unused most of the time, and when you really need it you run into capacity issues, so you’d need to predict your future needs carefully or be happy waiting for a job to finish.

AI training feels like transport. You rent the capacity/vehicle you need on demand, benefit from yearly upgrades. Very few people are doing so much training that they need a local powerhouse, upgraded every year or so.

Even sharing the hardware in a pool seems more rational. Pay 200/month for access to a semi private cluster rather than having it sit on your desk.


A local NAS makes sense for some use cases. Privacy, local control, and network bandwidth adds up to having a NAS for some use cases.

I see similar coming for AI. Tagging local photos, reading/summarizing private documents, helping you code (without uploaded it to 3rd parties), using uncensored LLMs, maybe even playing NPCs once games support API for LLMs.

It's going to take awhile, but I wouldn't be surprised if NASs start supporting AI accelerators to provide local endpoints for AI.


I'm not too familiar with the AI space but I wonder if this is an effort from NVIDIA to combine their AI and Gaming markets. Did this come from a conversational question stating, 'How do we sell discrete cards through our existing manufacturing partnerships to both gaming enthusiasts and AI enthusiasts?'. I do wonder how comfortable they are pivoting back to being a consumer hardware company if AI becomes a more competitive space, or if the 'hype' subsides. Pure speculation and I'm probably off the mark.


Yeah quite possible. They have distribution, brand, customers used to paying a lot of money. Can go far just selling to the portion of existing customers who would like "the best local setup for AI".


A video editor wants the tool sitting on their desk. Not pay-per-gen SaaS where the results are garbage nine out of ten times.

The comfy ecosystem is rife with people that want local tools.


6000 / 24 = 250 usd

If you replace your GPU every 2 years, its 250 usd per month

If the price halves, still its 125 usd.

Even if price halves and use for 5 years, its 50 usd per month.


People are already spending far more than that on Runway and Kling:

> I spent $745 in Kling credits to bring the Princess Mononoke trailer to life.

https://www.reddit.com/r/aivideo/comments/1fvchbf/i_spent_74...

Most of that is in failed generations.


I see the DIGITs box as mainly for inference, not training. It allows me to load a fairly large model (e.g. 70B llama or 12B flux) and run it locally at decent speeds.


Then surely far simpler custom chips are the eventual model, like happened with crypto? Groq, Etched etc. In that universe, Nvidia has absolutely no moat and a thousand chips are coming.


Maybe. For me to consider buying some unknown hardware it needs be a lot better than Nvidia. Like double the speed, half the power, and half the price. And the software better be rock solid and tested on all popular models.

Many startups have been trying for several years now, and eventually someone will succeed, but it's not easy. Even AMD hasn't been able to pull it off.


privacy for machine learning to be locally usable besides a chat interface in a meaningful way I would have to give it access to pretty much all my digital data

sending that over the network feels very idk icky because it's not just photos or emails


The article is focused on Nvidia, but note that Apple [1][2] and Google [3] have also been working in this area and will undoubtedly continue to do so.

[1]: https://developer.apple.com/machine-learning/core-ml/

[2]: https://machinelearning.apple.com/research/neural-engine-tra...

[3]: https://research.google/blog/improved-on-device-ml-on-pixel-...


Related Nvidia's Project Digits is a 'personal AI supercomputer' (622 points, 8 days ago, 501 comments & jeans) https://news.ycombinator.com/item?id=42619139


What is Nvdia's track record with releasing/supporting its own Linux-based OS? Can I easily switch to a different OS?


"I'll never buy a SBC from Nvidia unless all the SW support is up-streamed to Linux kernel," top comment from prev discussion https://news.ycombinator.com/item?id=42623030


NVidia basically abandonned various tegra related SBCs. But this one runs DGOS which they have a few $b worth of hardware floating around in various clouds and is a derivative of ubuntu.

Seems like this is MUCH more likely to have at least decent support, I think the current DGOS is based on ubuntu 22.04 LTS.


https://www.tomshardware.com/pc-components/cpus/nvidia-arm-s...

> Nvidia will be introducing two new chips, the N1X at the end of this year and the N1 in 2026. Nvidia is expected to ship 3 million N1X chips in Q4 this year and 13 million vanilla N1 units next year. Nvidia will be partnering with MediaTek to build these chips, with MediaTek receiving $2 billion in revenue.. Nvidia will show off its upcoming ARM-based SoCs in Computex in May.


From what I could gather on related communities Project Digits will run 200B models very slowly so there is no breakthrough there'


pretty unconvinced. when desktop gaming started you didn't have low latency high bandwidth reliable internet. if you did, you probably wouldn't have people buying cards at all and instead GeForce Now would have been the whole market.

we're already at that stage now with AI / LLMs. this type of physical product will remain niche.


Not everyone has low latency high bandwidth reliable internet, and almost no one has it all the time. In addition, the privacy and security benefits from running things locally is a requirement for some.


The people I know that use local AI instead of remote AI like the privacy, response time, and not being charged per-query.


Absolutely. My pathetically small local LLM is in some ways vastly superior to cloud-based models that are orders of magnitude larger. Not in ability, but in how I can use it.


we do now but Stadia failed to have a meaningful impact for a reason

obviously companies will try to migrate everything to the SaaS model because it is better for revenue and bug fixes but not necessarily the best user experience

don't want games to stop if I dont't have wifi and i don't want ML stuff to require internet or feel comfortable sending my entire digital data to a third party or have models be updated because of some compliance policy

but it's still in the infancy companies will do everything to make it over the network one or two nice tech people will try to make it local/hybrid hopefully they succeed


The track record of standalone Nvidia appliances is pretty poor. The shield console and it's portable version disappeared fairly quickly, the Jetson dev board is for laughs since software support is awful, so I am not holding my breath for this one.

It will take more than jeans and leather jackets to sell those


The Shield TV came out in 2015, was last refreshed in 2019 (which you can still buy today), and the entire line is still getting updates. That's longer than e.g. the PS Vita.


https://www.nvidia.com/en-us/shield/software-update/

They are still selling in 2025 an Android 11 device whose last update is from 2022.

Sounds about right: Nvidia gives a shit about maintaining the devices they sell.


> They are still selling in 2025 an Android 11 device whose last update is from 2022.

Further, there was a beta test update just last month.


Still no word on where and how to buy these?


It’s just been announced, shipping in July last I heard, will be a few months before you can pre order


They announced May, has that changed? Source?


These things look pretty small I wonder if someone will make a 2U rack tray to hold a few of them.


Is this just an ad for nvidia's new box or is the author actually making a point?


An ad masquerading as “insight” is nothing new in this space. Author is eating up the NVDA marketing and joining the cult.


And kind of a poor choice of words, since what they "did for gaming" recently is abandon what made them a powerhouse company to begin with on some new adventure of borderline scam chasing


Even so, I expected something a little more insightful than a copy paste of nvidia's press release. And for some reason it is upvoted to the top of HN.


I’ll wait for the benchmarks. NVDA marketing is known to oversell.


This is just an ad.


Expensive and rare?


They did stuff to desktop AI?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: