Check out the specs here: http://images.nvidia.com/content/technologies/deep-lea...

dgacmu · on April 5, 2016

You mean you're not surprised that a machine with 8 GPUs, apparently costing $129k USD (from comment below), can outperform a single CPU? :)

(Of course, a better metric is that it's getting ~56x the performance at probably ~10x the TDP, but that's not surprising for a GPU with the current state of deep learning code.)

To their credit, the thermal and power engineering needed to get that dense a compute deployment is challenging. (bt, dt, have the corpses of power supplies to show for it.) But the price means that it's going to be limited to hyper-dense HPC deployments by companies that don't have the resources to engineer their own for substantially less money, such as Facebook's Big Sur design: https://code.facebook.com/posts/1687861518126048/facebook-to... . And, of course, the academics and hobbyists will continue to use consumer GPUs , which give much better performance/$ but aren't nearly as HPC-friendly.

aconz2 · on April 5, 2016

To be fair, they are comparing it to a dual-socket CPU; which is twice as fair as comparing to a single!!

What I was getting more at was: I want to know the relative performance compared to another 8 Tesla box. I know comparing apples isn't good marketing, but c'mon.

cowsandmilk · on April 5, 2016

They gave a pseudo-comparison to other GPUs in the keynote

http://images.anandtech.com/doci/10225/DGX-1Speed.jpg

astrodust · on April 5, 2016

$129K buys you a lot of dual 22-core servers.

pinewurst · on April 6, 2016

What kind of server pricing are you getting? Base servers are cheap, but add high-end Xeons and memory, not to ignore interconnect and I get something like 7 ok configured 1U servers for $129K (2 20 core w lots of RAM, 10GbE NICs and mirrored boot/swap). No interconnect switching. That's for 20 core Haswell because I don't yet have discount pricing for Broadwell Xeons. I'm sure one could do better at hyperscaler discount but this is startup low-ish quantity.

Keyframe · on April 6, 2016

Not really. Not a lot at least.

rbranson · on April 6, 2016

if you include space, power, HVAC, and networking?

virtuallynathan · on April 5, 2016

It looks like it uses a separate daughterboard that houses the GPUs + NVLink, connected to the main motherboard using quad Infiniband EDR (400Gbps) + RDMA. http://images.anandtech.com/doci/10225/SSP_85.JPG

pinewurst · on April 5, 2016

The diagram is confusing, but the GPUs are connected to the NVLink matrix which is connected to the motherboard via the PLX PCIe switches. The quad IB/dual 10GbE are separate IO attached to the motherboard.

https://devblogs.nvidia.com/parallelforall/inside-pascal/

virtuallynathan · on April 6, 2016

That would make much more sense. Thanks! The PCI bandwidth must be fairly limited. 4x 100G Infiniband is 64x PCIe lanes, out of 80x lanes available.