Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Great observations. I see this paper as the first in a three-part series. The second part is specializing it for convolution (which has additional structure to exploit), and the third is hooking these approximate ops into deep neural nets the way people currently do with scalar quantization / pruning.

I'm not optimistic about beating tensor cores when running on GPUs, at least until/unless we get similar hardware support.*

Barring better hardware support, the killer app is probably CPU inference--once there are Conv implementations and the necessary GPU kernels to train the network.

*Aside: this support would be pretty doable since the kernels look almost identical to GEMM kernels--you just need a multiplex-add rather than a multiply-add. On an x86 machine, all it would take is a vpshufb-add and a 4-bit unpack instruction.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: