Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The P100s have full support for half-precision (i.e. 16 bit) floating point ops. This can mean ~2x improvements in speed and memory usage in comparison to the Pascal TitanX, which is the top "consumer" card. This difference is significant for almost any machine learning workload, which is what a lot of these cards will be used for.

NVIDIA gimped half-precision on the consumer cards to drive datacenters, hedge funds, machine learning companies, etc. towards the "professional" cards (and their huge markup).



FP16 performance is only relevant until people figure out how to train NNs using INT8. See, for example, [1] for recent advances in that direction.

After that, it's going to be mostly about memory size and bandwidth.

[1] https://arxiv.org/abs/1603.01025


First NVIDIA solidified their Monopoly by forcing CUDA... then they gimped half-precision on consumer cards.

We really need some more Frameworks that work with OpenCL, so that we can have some competition from AMD, who's consumer cards are not gimped.


Gimping, in this case, is actually: adding hardware, that costs quite a bit of silicon area, on one chip that will probably never be sold as a consumer GPU.

I don't see the issue with a company making a very high-end product, adding stuff that doesn't have good use for consumers, and asking extra money for their effort.

AMD doesn't have double speed FP16 on its current FPUs either. The latest version has FP16 at the same speed as FP32, but if you're doing that you might as well use FP32 always.

And let's not forget: the Nvidia consumer GPU have deep learning quad int8 operations enabled at all time. They didn't need to do that and could have reserved it for their Tesla product line only.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: