you can deal with exponentially larger numbers with roughly linearly (or polynomial) increasing memory, while if you use analog circuits you have to pay a quadratic cost on the exponential
This does not make sense to me. Can you explain?
I think there might be misunderstanding of how analog computing is used to build a neural network. First, a weight is stored as some analog physical property, typically as charge on a floating gate, or on a capacitor in a DRAM type cell. Second, the multiplication operation is performed by modulating the analog input signal going through the floating gate transistor by the charge on the floating gate (weight). Third, the summation is done via simple summation of the currents. Finally, activation function is performed by an opamp.
Regarding power consumption:
1. A digital computer needs a thousand of transistors to perform multiplication, analog circuit can do it with a single one.
2. Analog NN stores parameters (weights) locally, right where they are needed to perform computation. Digital NN will need lots of memory transfers to bring weights from RAM to ALU, and to store intermediate results.
That's why a properly implemented analog NN will always consume much less power.
> 1. A digital computer needs a thousand of transistors to perform multiplication, analog circuit can do it with a single one.
That's interesting. What would the circuit be?
> Digital NN will need lots of memory transfers to bring weights from RAM to ALU, and to store intermediate results.
That's not necessarily the case. Cellular neural networks were proposed long ago, for example, and they're digital -- how multiplication happens is independent from the data flow architecture.
> That's why a properly implemented analog NN will always consume much less power.
How do you know that the I^2 cost of operating in the linear regime isn't excessive? I'm totally ignorant on the matter -- I'd love to see a ballpark calculation to understand why it isn't important.
As I described above: "the multiplication operation is performed by modulating the analog input signal going through the floating gate transistor by the charge on the floating gate (weight)." The circuit is the single transistor in this case.
Cellular neural networks were proposed long ago, for example, and they're digital
What is so inherently digital about cellular networks? Can you provide a link to an implementation of a cellular net in digital hardware? How the weights are stored? Where the multiplication happens?
> This does not make sense to me. Can you explain?
I understood the reasoning to be that to increase the range of accurately representable values in a circuit, you either need to increase the voltage or current used in an analog circuit (to achieve a certain accuracy versus a noise baseline), or devote more bits in a digital circuit. The first gives a linear dependence (or quadratic for I^2 losses) of power on range, the second logarithmic.
Ah I see. Well, remember, with analog circuits, we are talking about subthreshold currents. This current is orders of magnitude less than the current in a digital circuit (nA vs uA). Correspondingly, the power consumption will be negligible in comparison, even if you expand the current range. And that is only a fraction of the total power consumption. Adding more bits in a digital circuit linearly increases total power, dominated by interconnect capacitance.
That was an important observation. Fighting noise is one of the primary reasons the first digital computers were invented.
To give a bit of a dramatic illustration, if you circuit has on the order of 1 nV of thermal noise and you wanted to do the linear analog equivalent of 64bit integer arithmetic, you would need a signal on the order of 10,000,000,000 V to have enough precision. In fact, in terms of power consumption it's even worse. If the 1 nV signal consumes something like 1 pW, you would need something like the total power output of the Sun (on the order of 10^26 W) -- a bit of an expensive multiplication, no :) ? That's how crazy it is!
Again, if you can get away with less than 8 bits of precision and imperfect linearity the picture changes, but I wouldn't declare it superior a priori without looking at the numbers.
Or, you could split your 64 bit computation into 8 bit computations, which could be done with analog circuits, and still save a lot of power! :-)
But yes, I understand your point. Both analog and digital implementations have their strengths and weaknesses. If you value power over precision, go with analog. If the opposite - go with digital.
Right, but note you can't even split it, if you are thinking of linear circuits. Precision necessarily means how your signal compared to the thermal noise floor. It is possible to show can't compose 8-bit precision linear units to get a >8-bit precision value. What happens is actually the opposite, if the noise of the units are uncorrelated noise will propagate and increase to the tune of sqrt(number of operations). Avoiding error propagation is another advantage of digital operations.
The reason NNs don't exhibit strong error propagation is because of the non-linearities between linear layers that perform operations analogous to threshold/majority voting or the like, which have error correction properties.
Interesting, but then how do you explain that rectified linear operations between layers work better than sigmoids?
According to your logic, ReLU should have worse error propagating quality than squashing functions?
I'm going to reply to your question below here since HN is preventing a reply (anti-flaming/long threads I guess).
Be careful with jumping to conclusions: I never even cited ReLUs or Sigmoids in my post! I don't have any opinion on which non-linearity is better, I only know both are dramatic non-linearities. My claims were about linear circuits. You should use whatever nonlinear element works best in your Neural Network, of course (and I've heard ReLUs have good advantages).
This does not make sense to me. Can you explain?
I think there might be misunderstanding of how analog computing is used to build a neural network. First, a weight is stored as some analog physical property, typically as charge on a floating gate, or on a capacitor in a DRAM type cell. Second, the multiplication operation is performed by modulating the analog input signal going through the floating gate transistor by the charge on the floating gate (weight). Third, the summation is done via simple summation of the currents. Finally, activation function is performed by an opamp.
Regarding power consumption: 1. A digital computer needs a thousand of transistors to perform multiplication, analog circuit can do it with a single one. 2. Analog NN stores parameters (weights) locally, right where they are needed to perform computation. Digital NN will need lots of memory transfers to bring weights from RAM to ALU, and to store intermediate results.
That's why a properly implemented analog NN will always consume much less power.