Put another way: if AMD (and especially Intel) don't do something about this they're going to get completely eaten alive by ARM.
The amount of processing power available in a modern smartphone is truly mind-boggling. I'd love to see a chart showing the chip cost and energy cost of the power on an M1 chip in each previou syear. I would guess that 30+ years ago you'd be in the millions of dollars and watts of power but that's just a guess.
As we see from the modern M1/M2 Macbooks, these lower TDP SoCs are more than capable of running a computer for most people for most things. The need for an Intel or AMD CPU is shrinking. It's still there and very real but the waters are rising.
> Put another way: if AMD (and especially Intel) don't do something about this they're going to get completely eaten alive by ARM.
AMD’s latest parts are actually quite close to M1/M2 in computing efficiency when clocked down to more conservative power targets.
They crank the power consumption of their desktop CPUs deep into the diminishing returns region because benchmarks sell desktop chips. You can go into the BIOS and set a considerably lower TDP limit and barely lose much performance.
Where they struggle is in idle power. The chiplet design has been great for yields but it consumes a lot of baseline power at idle. M1/M2 have extremely efficient integration and can idle at negligible power levels, which is great for laptop battery life.
People keep repeating that Zen4 and M1 are close in efficiency but what is the source with actual benchmarks and power measurements?
At any rate, using single points to compare energy efficiency isn't a good comparison, unless either the performance or power consumption of the data points comparable. Like, the M1's little cores are 3-5x even more efficient when operating in an incomparable power class, and Apple's own marketing graphs show the M1's max efficiency is also well below its max performance [1]
Those perf/power curves are the basis of actually useful comparisons; has anyone plotted some outside of marketing materials? It might even be possible under Asahi.
Their results are invalid because they used Cinebench. Cinebench uses Intel Embree engine which is hand optimized for x86, not ARM instructions. In addition, Cinebench is a terrible general purpose CPU benchmark.[0]
Imagine if you're testing how energy efficient an EV and a gas car is. But you only run the test in the North pole, where the cold will make the EV at least 40% less efficient. And then you make a conclusion based solely on that data for all regions in the world. That's what using Cinebench to compare Apple Silicon and x86 chips is like.
Cinebench/4D does have "hand-optimized" ARM instructions. It would be a disaster for the actual product if it didn't. That's what makes it interesting as a benchmark: that there's a real commercial product behind it and a company interested in making it as efficient as possible for all customer CPUs, not just benchmarking purposes.
Albeit for later releases this is less true since most customers have switched to GPUs...
Cinebench/4D does have "hand-optimized" ARM instructions.
It doesn't. As far as I know, everything is translated from x86 to ARM instructions - not direct ARM optimization.
Cinema4D is a niche software within a niche. Even Cinema4D users don't typically use CPU renderer. They use the GPU renderer.
The reason Cinebench became so popular is because AMD and Intel promote it heavily in their marketing to get nerds to buy high core count CPUs that they don't need.
Generally you see this in the lower class chips that aren’t overclocked to within an inch of instability. It’s not uncommon to see a chip that uses 200w to perform 10% worse at 100w, or 20% worse at 70w.
I can’t be bothered to chase down an actual comparison, but usually you’ll see something along those lines if you compare the benchmarks for the top tier chip with a slightly lower tier 65w equivalent.
It's actually this idling power which is what defines battery drain for most people. All these benchmarks about how much it can for a certain compute intensive task is not that important considering that most of the time a laptop is doing almost nothing.
We just stare at an article in a web browser. We look at a text document. We type a bit in the document. An app is doing an HTTP request. The CPU is doing nothing basically.
Once in a while it has to redraw something, do some intense processing of an image or text, but it takes seconds.
It's the 99% in idling that counts and there most laptop CPU's suck.
Even when watching a video the CPU is not (should not be) doing much as there are HW co-processors for MPEG-4 decoding built in.
It's quite embarrassing how AMD and Intel have screwed up honestly.
And that's why so far AMDs mobile processors have been monolithic and not chiplet-based. That is supposed to change with Zen 4's Dragon Range, however most of the mobile lineup will still be monolithic and these high-power/high-performance processors should go exclusively to "gaming" notebooks.
I care a lot about idle power, even on my desktop PC. It seems crazy to me that in 2023 I still need to consider whether maybe I should shut down my computer when I'm not using it.
What should I be buying to not have to ask myself that question?
If you take a Zen 3 running at optimal clocks for efficiency (such as the 5800U) the difference between its computing performance per watt is competitive with the M1 if you account for the difference in node size (which TSMC claims gives 30% less power consumption at the same performance). As the article points out, the real efficiency gains will be domain specific changes such as shifting to 8 bit for more calculations.
How would x86-64 be as efficient with the same transistor & power budget when they have to run an extra decoder and ring within that budget? Seems physically impossible.
All else being equal, they can't. But the difference isn't as big as some people like to think. For a current high end core, probably low single digit %. And x86-64 has had a lot more effort going into software optimization.
As I understand it, the actual processing part of most chips nowadays is fairly bespoke, with a decoder sitting on top. I doubt decode can make up that large a portion of a chips power consumption (probably negligible next to the rest of the chip?), so other improvements can make up for the difference.
The latest ARM Cortex CPUs (models X2, A715 and A510) drop 32-bit support. Qualcomm actually includes two older Cortex-A710 cores in the Snapdragon 8 gen 2 for 32-bit support. Don't know much about Apple Silicon but didn't they drop 32-bit a couple of years back?
Google has purged 32-bit apps from the official Android app store, but as I understand it the Chinese OEMs that ship un-Googled AOSP ROMs with their own app stores haven't been as aggressive about moving to 64-bit.
Because the more complex decoder is traded in this case for a denser instruction set, which means they can trade it for less instruction cache (which is more power hungry).
Honestly I don't understand why there's not something like a 256 core ARM laptop with 4TB RAM.
The benefit of ARM is scale of multitasking due to not requiring the same kind of lock states that Intel's architecture requires, and can additionally scale much better than only one physical+virtual core pair.
I guess the only thing that's holding back ARM is Microsoft, as laptops are expected to run an desktop OS that people are comfortable with. Windows RT wasn't really a serious desktop OS and rather a joke made only for some IoT enterprises instead of end-users.
I wish there was more serious hardware than the standard broadcom or MediaTek chips, I'd definitely want some of that...be it as a mini ATX desktop/server format (e.g. as a competitor to Intel NUC or Mac Mini) or as a laptop.
With the ongoing energy crisis something like solar powered servers would be so much more feasible than with x86 hardware.
> Honestly I don't understand why there's not something like a 256 core ARM laptop
The high power ARM cores aren’t that small. If you took the M2 and scaled it up to 256 cores, it would be almost 7 square inches. You can’t just scale a chip like that, though, so the interconnects would consume a huge amount of space as well. It would also consume over 1000W.
The latest ARM chips are great, but some times I think the perception has shifted too far past the reality.
7 square inches would also include an enormous GPU and tons of accessories.
The actual cores are about .6/2.3 mm², and local interconnects and L2 roughly double that.
So with just those parts, 256 P-cores would be about 1.5 square inches, and 256 E-cores would be about half a square inch. And in practical terms you can fabricate a die that's a bit more than a square inch.
Of course it wouldn't use 1000 watts. When you light up that many cores at once you use them at lower power. And I doubt a 256 core design would have all that many P cores either.
As a rough estimate, you could take the 120mm² M1 chip, add 28 more P-cores with 110mm², 220 more E-cores with 300mm², 128 more MB of L3 cache with 60mm², 100mm² of miscellaneous interconnects, and still be on par with a high end GPU.
That sounds doable but is pushing it. A 128 core die, though, has nothing stopping it except market fit.
even a 128 core part made like that will perform pretty atrociously. scaling up the core count without scaling the cache count means you have a lot of cores waiting for memory. also when you have 128 cores, you almost certainly need more memory channels to have enough bandwidth.
Could we make the chips go slower like around 1Ghz? Maybe that is not feasible with the current software architecture to achieve great user experience.
> The benefit of ARM is scale of multitasking due to not requiring the same kind of lock states that Intel's architecture requires
I have no idea what you mean by this. The only x86 feature I can think of that might qualify as a 'lock state' is a bus lock that happens when an atomic read-modify-write operation is split over two cache lines. That has a very simple solution ('don't do that'--you have no reason to), and anyway, one can imagine more efficient implementation strategies
> can additionally scale much better than only one physical+virtual core pair
I have no idea what you mean by this either. Wider hyperthreading? It can be worthwhile for some workloads (and e.g. some ibm cpus have 4-way hyperthreading), but is not a panacea; there are tradeoffs involved.
The largest number of high-performance ARM cores you can get in a single socket is the Ampere Altra Max with 128 ARM Neoverse-N1 cores. At 2.6 GHz the processor consumes 190 W, and at 3.0 GHz up to 250 W. This is a server chip, not something you can put in a laptop.
I think because general compute is hard to parallelize, so 256 cores doesn't help much in practice. (Compute that does parallelize well already runs on GPU).
>I guess the only thing that's holding back ARM is Microsoft
It's not Microsoft holding it back. It's Qualcomm.
Apart from their very latest SOC (designed by a bunch of ex-Apple employees, no less) their CPUs have are significantly worse than x86 in terms of general performance and have persistently lagged 4 years behind Apple in terms of performance (3 years behind x86). They sell for the same price per unit as x86 CPUs do, so there aren't very many OEMs that take them up on the offer given the added expense of having to design a completely different mainboard for a particular chassis.
As such, x86 is the only game in town if you're buying a non-Apple machine; Qualcomm's products aren't cheaper and perform much worse outside of having more batter life. Sure, Qualcomm owns Nuvia now, but that acquisition will still take some time to bear fruit.
>> I would guess that 30+ years ago you'd be in the millions of dollars and watts of power but that's just a guess.
30 Years ago I don't think the compute power of a modern phone chip was available at any price, even in super computers.
On a tangential note, there are economists who think this increase in compute is somehow an increase in one of their measures - I don't recall which one. I disagree, because with that logic we all have trillion dollar tech in our pocket. Making a better product over time is expected, it's not some kind of increase in output.
The Top500 supercomputer list started in June 1993, just about 30 years ago. At the top is the CM-5/1024 by Thinking Machines Corporation at Los Alamos National Laboratory with 1,024 cores and peaking at 131.00 GFlop/s (billion floating point operations per second).
It's an Apples to ThinkingMachine Oranges comparison but CPU-Benchmark[1] ranks the Apple A16 Bionic used in the latest iPhones, its GPU - in the "iGPU - FP32 Performance (Single-precision GFLOPS)" section - at 2000 GFlop/s.
GadgetVersus[3] reports a GeekBench score of the A16 Bionic at 279.8 GFlop/s. - SGEMM test of matrix multiplication, it seems.
AnandTech[4] was reporting the A15 architecture ARMv7 came in at 6.1 GFlops in the "GeekBench 3 - Floating Point Performance" table, SGEMM MT test result, in 2015.
Interesting. I would have thought a few GFLOPs today would have been faster than the old super computer, but nope. The GPU is faster though. Still, the phone has both and can run on battery power while fitting in your pocket ;-)
In my experience talking to semiconductors folks, ARM is just not a concern anymore. The future is RISC-V, and ARM is already being seen as legacy tech. ARM's progress in the server space has stalled, the ARM Windows ecosystem is dead, Android has laid the groundwork for a move to RISC-V, and ARM has never and will never touch the desktop market.
> It proves my point beautifully when the only response to my comment
Your comment was beyond ignorant, and wrong. Most folks here are too smart, or busy, to reply to such nonsense. I am neither.
ARM is by far the most shipped and used arch every year. AMZ is even going in heavier on it. It's not legacy tech at all. So a person decided to show you how wrong you were, by listing what's probably the most impressive chip in all of our lifetimes, and it's guess what, ARM.
> Obviously I was referring to Linux/Windows workstations
The creator of Linux is using an ARM machine as a workstation today, AFAIK.
> if everyone was smart enough to pick up on that I wouldn't be paid as much as I am
If you're making more than a burger flipper at Wendy's, the world just isn't fair.
The one viable server ARM CPU core is now tied up in a Qualcomm-ARM legal spat and probably won't see the light of day and made it pretty clear to anyone not grandfathered in like Apple that it's not worth designing your own ARM core. ARM itself has been hemorrhaging employees both because of better offers from Apple and the RISC-V stealths, and because since the SoftBank push to get their money back has simply been a worse and worse place to work. Their ability to execute is extremely compromised.
Because if the long tail of the hardware industry, the writing can be on the wall long before it's clear based on what you can go out and buy off a shelf today.
I get it that you want RISC-V to succeed - so do I - and to advocate for it but I really don’t understand why it needs this sort of comment about Arm. I see exaggerated criticism of the Arm ISA elsewhere from people who ought to know better too - it’s really CISC, it’s 5000 pages vs 2 for RISC-V etc. It’s just not necessary.
I mean, nothing I said is exaggerated here. ARM doesn't even have a viable server core that can compete with x86 even as vaporware. SoftBank ruins everything they touch, and is super focused at the moment on stealing from Peter to pay Paul to get something out of the upcoming ARM IPO since their attempt to sell it off to Nvidia fell through. The rumor is they've been cutting R&D funding hard to get temporarily boost profitability. If anything this is more a dig at how vulture capitalism ruins productive companies.
As an aside, ARM has always been a hybrid CISC/RISC core. It has nothing to do with the number of instructions, but the fact that not having an I$ on the ARM1 forced it to have microcoded instructions to mainly to support LDM/STM. That's not a dig at ARM. It's a valid design; particularly at that gate count.
You jumped in in support of a comment that said Arm is ‘legacy’ tech. You said they don’t ‘even have a viable server core’. They are ‘haemorrhaging’ staff. Softbank have ‘ruined’ them.
Sounds more apocalyptic than exaggerated tbh.
I still don’t know why you think this is necessary.
The M1/M2 Macs run Linux pretty well. It's not perfect yet, but perfectly usable (especially as a desktop machine!) and support is improving every day.
I believe you're trying to move goalposts to avoid admitting you're wrong.
Graviton, M1/M2, Ampere etc but I’m sure you’ll be able to explain why Arm is seen as ‘legacy’ tech when billions of smartphones are being shipped every year with Arm CPUs.
Oh look, you named 4 areas where ARM development has already peaked. Hyperscalers are already looking to evolve from ARM in the near future, just look at how much attention Ventana got at RISC-V Summit. M1/M2 are Apple ecosystem specific phenomenon that haven't inspired any copycat products. Ampere has been a massive disappointment to everyone in the industry, see the fact that Nuvia had their entire business dead-to-rights pre-acquisition. ARM simply isnt at the cutting edge of the semiconductor industry anymore. Just because Apple and Qualcomm use it to great effect doesn't mean ARM is making any major innovative strides relative to the competition.
> ARM simply isnt at the cutting edge of the semiconductor industry anymore.
What you really mean is Arm isn’t the hot new thing anymore. Well it hasn’t been that for 20 years. Meanwhile billions of arm devices in leading edge nodes are being shipped. Oh well.
If RISC-V support by Microsoft is as bad as it has been for ARM, then I'm afraid RISC-V will never touch the desktop market, at all. Contrary to ARM, which is being pushed there with great success by Apple. Server-wise of course it's a different story...
If great success to you is that they put the M1 and M2 in a tower, I don't know what to tell you. Intel, AMD, and the x86 industrial complex don't care in the slightest what instruction set your Mac runs
Might I suggest taking a step back, re-reading your first comment and all the replies under it, and asking yourself "is it possible I might not be 100% correct, and maybe other opinions have enough merit to be worth considering why people aren't agreeing with me, rather than just changing my argument to make sure I'm still the winner of this thread"?
I’m not sure I expressed my point clearly. It wasn’t quite about Apple. So I will reformulate it here: the fate of any instruction set on the desktop is primarily decided by Microsoft.
Do you have any information that Microsoft is planning to support RISC-V at least as well as x86/x64? (That is to say, not with something like Windows RT, or Windows CE)
That would be tremendously good news, I shall add.
>In my experience talking to semiconductors folks,
Most, if not all SemiCoductor “folks” I know are very pragmatic. As in how a Real Engineer should be, unlike software engineers. And in my experience, only HN and the Internet are suggesting ARM is dead. Everything will be RISC-V.
RISC-V is free and open as in libre, by contrast to x86 and ARM which must be licensed from Intel/AMD and ARM and are thus subject to potential western economic sanctions.
Now, yes, China will just espionage and kangaroo court their way through and around such legalities anyway, but nonetheless RISC-V is less effort for more reward for China if it becomes at least on par with x86 and ARM.
Put more basically, it's a matter of national security. China can have an entire RISC-V ecosystem indigenously, unlike x86 and ARM.
If the US and/or UK place sanctions on exporting microprocessor technologies to China then that's that. Intel/AMD and ARM are subject to US and UK laws and regulations respectively.
RISC-V by contrast is much, much harder for any given country to regulate because of its free and open nature. At most the US and UK can embargo individual developments made within their jurisdictions, but they can't regulate the entire architecture. RISC-V doesn't have a kill switch named Intel/AMD or ARM.
ARM China is a wildly different animal than ARM. They went rogue a few years back and though SoftBank/ARM did a lot to get things back in line, it still shows up like this:
I'm loving it. I used cheap risc-v boards for several of my projects, most notably a GD32V in my keyboard. The equivalent stm boards weren't too expensive, mostly in the 10-20$ range, but weren't as easily available (and 10$ is still 3x the price of the chinese risc board)
Though the rp2040 has largely ended my cheap risc-v addiction
As an experiment quite a few years ago I got a laptop with a special version of the Intel CPU that was not as fast but much more power efficient.
ASUS UL30A-X5
Really an excellent computer, ran linux great (games didn't really exist yet though), and with tuning was coming in under 10W if the display brightness was turned down. First time I was able to get through flights without the system running dead.
I think in this case what's going on is that temperature rises increase resistance in a chip and therefore cause lower efficiency. If you can keep it cool, you can keep it more efficient. The move seems like a necessary one, a computer as powerful as that UL30A is probably inside the phone if you turn off the radio and display, that thing still had a giant battery and only lasted 10-12 hours.
I've seen AMD do some pretty impressive things, I wouldn't count them out. They're at least willing to attempt to compete on price.
From what I've seen the CPU/ALU/decode at the center being ARM or x86 may make less difference than you think. The amount of circuitry and silicon area (correlated with power) for non core is significant. MMUs, vector instructions, complex cache hierarchy, high speed IO (DDR, pcie, you name it) extremely complex network on chip (infinity fabric) to enable cpu interconnectivity, etc. Is very significant. Look at the IO die size vs the CCD size. As one poster pointed out using chiplets have great advantages, but there is a power hit. Thankfully newer tech is bringing that power down too. I'd love to see a power breakdown of a full chip to see what % is attributed to the cpu core itself.
In my own experience, the supposed ARM chip superiority claims are almost entirely marketing. I get significantly better performance (15-50%) from nearly all of my CPU workloads on modern Intel/AMD hardware vs the ARM Apple devices.
The amount of processing power available in a modern smartphone is truly mind-boggling. I'd love to see a chart showing the chip cost and energy cost of the power on an M1 chip in each previou syear. I would guess that 30+ years ago you'd be in the millions of dollars and watts of power but that's just a guess.
As we see from the modern M1/M2 Macbooks, these lower TDP SoCs are more than capable of running a computer for most people for most things. The need for an Intel or AMD CPU is shrinking. It's still there and very real but the waters are rising.