The result is mildly interesting but doesn't strike me as remarkable. We've always known that overparameterization with regularization can improve performance in certain classes of problems. The comparison with using n-order polynomial to fit n points is also non-sequitur.
So there's a lower bound on the number of parameters required to produce a good interpolation of a broad class of "smooth" functions, and that's larger than the data size. Ok. I'm guessing one could find an even larger lower bound to approximate a more exotic class of functions that are still "smooth" in some intuitive sense, and it wouldn't make me any more excited.
The main problem is, what does it say about how well a neural network can approximate specific types of functions, compared to anything else, with the same or more parameters, with the same huge amount of data, which has always been the real mystery here?
Can we analytically define a class of functions that can be fit well by neural networks with n data points, using < parameters, than any other known method, and map these functions to real world applications? Or, in other words, given a class of functions representing general real world data sets that neural networks tend to do well on (dogs and cats), can we characterize a tractable algorithm to compute nd parameters needed from the n sample points where there's high probability that it could produce a good interpolation?
If not, then this paper doesn't explain anything better for neural networks than for say kernel methods with polynomial kernels or manifold reconstruction using restricted classes of spline functions. The sensationalist headline, along with the "books need to be written" rhetoric, wants to make us think the result or the techniques it presented could get us closer to answering the above questions, but it seems to me there's zero truth in that.
So there's a lower bound on the number of parameters required to produce a good interpolation of a broad class of "smooth" functions, and that's larger than the data size. Ok. I'm guessing one could find an even larger lower bound to approximate a more exotic class of functions that are still "smooth" in some intuitive sense, and it wouldn't make me any more excited.
The main problem is, what does it say about how well a neural network can approximate specific types of functions, compared to anything else, with the same or more parameters, with the same huge amount of data, which has always been the real mystery here?
Can we analytically define a class of functions that can be fit well by neural networks with n data points, using < parameters, than any other known method, and map these functions to real world applications? Or, in other words, given a class of functions representing general real world data sets that neural networks tend to do well on (dogs and cats), can we characterize a tractable algorithm to compute nd parameters needed from the n sample points where there's high probability that it could produce a good interpolation?
If not, then this paper doesn't explain anything better for neural networks than for say kernel methods with polynomial kernels or manifold reconstruction using restricted classes of spline functions. The sensationalist headline, along with the "books need to be written" rhetoric, wants to make us think the result or the techniques it presented could get us closer to answering the above questions, but it seems to me there's zero truth in that.