An interesting work, with some to-be-addressed questions: 1.The paper only cover...

ffast-math · on Sept 1, 2021

Great observations. I see this paper as the first in a three-part series. The second part is specializing it for convolution (which has additional structure to exploit), and the third is hooking these approximate ops into deep neural nets the way people currently do with scalar quantization / pruning.

I'm not optimistic about beating tensor cores when running on GPUs, at least until/unless we get similar hardware support.*

Barring better hardware support, the killer app is probably CPU inference--once there are Conv implementations and the necessary GPU kernels to train the network.

*Aside: this support would be pretty doable since the kernels look almost identical to GEMM kernels--you just need a multiplex-add rather than a multiply-add. On an x86 machine, all it would take is a vpshufb-add and a 4-bit unpack instruction.

Ar-Curunir · on Sept 1, 2021

If it works better for inference, it could enable fast inference on devices which don't have good tensor cores/gpus

criticaltinker · on Sept 1, 2021

> I think this method is less attractive to training acceleration scenario

The proposed hash based encoding function is not differentiable, so it doesn’t appear this method can be used for training at all.

I’m not aware of any hash functions that are analytically differentiable, so to support efficient back-propagation I suspect that some fundamental changes to this method would be necessary.

ffast-math · on Sept 1, 2021

You could still optimize the prototypes, so fine-tuning with this in place would be possible (see, e.g., [1]). But we don't yet have data on how well this would work using our exact method, how early in training you could do the op replacement, etc.

[1] http://openaccess.thecvf.com/content_ECCV_2018/html/Sanghyun...