did my thesis on this topic (at that time we were searching the lower bound of A...

fgimenez · on Sept 20, 2015

I had similar empirical results on one of my PhD projects for medical image classification. With small data sets, we got better results on 8-bit data sets compared to 16-bit. We viewed it as a form of regularization that was extremely effective on smaller data sets with a lot of noise (x-rays in this case).

tachyonbeam · on Sept 20, 2015

When using 8-bit weights, what kind of mapping do you do? Do you map the 8-bit range into -10 to 10? Do you have more precision near zero or is it a linear mapping?

avereveard · on Sept 20, 2015

Don't know about him but I was working with -8 8 for input and -4 4 for weights, using atan function for transfer maps quite well and there is no need to oversaturate the next layer.