Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Backpropagation algorithm visual explanation (google-developers.appspot.com)
336 points by m1245 on June 27, 2018 | hide | past | favorite | 60 comments


I find explanations like these are great for people who understand mathematical notation, sigmoid functions, derivitives, etc. These people however typically understand whats going on from a simple text description of the process.

For those without a math background, the notation is very opaque. A far better explanation is to explain it numerically with simple examples.

For example, have two bits of training data:

input -> output

1 -> 0

0 -> 1

And a simple network with zero hidden nodes and train it. By hand...

Then add another bit of training data:

0.5 -> 1.5

Notice that it is now impossible to fit the training data exactly, however many training iterations we do. Now add a hidden layer with one or two nodes. Now we can perfectly fit the data, but show that depending on initialization weights we might never get there through gradient descent. Nows the time to mention different types of optimizers, momentum, etc.


Just to add a slightly different perspective: I'm comfortable with the notation and calculus involved, but had not known how backpropagation worked until now.

I'm not sure if it's the same for others, but I don't find bare text descriptions with formulas particularly useful. Mathematical notation on a page is great for rote application of rules and computation, but by itself does not easily communicate an intuitive understanding of the system the math represents. I have to work very hard to build up mental pictures of systems described by just notation, and those mental pictures often have to move in complicated ways as well.

The relationship between maths on the page and the systems they describe is the same as seeing musical notation on a page and hearing a full orchestra. One is a dry accounting of the facts involved. The other is moving and powerful in its richness and immediacy, a living thing that defies easy communication beyond the experience itself.

Demonstrations like this show you the maths _and_ build up a picture for you at the same time. The result of that is that you can communicate a very powerful idea (e.g. backpropagation) very precisely, intuitively and quickly.

Very much worth a five minute scroll for me; YMMV!


I half agree with you. The notation alone, although "understandable" doesn't provide a "deep" understanding.

I think the intuition is only half-transmitted with just the notation.

I do think working through an example sort of completes the intuition.


Kind of tired of people in the programming community proudly complaining about not knowing simple math notation. Educate yourself. Everyone else in the engineering world knows these basic math notations.


Math notation is simple because it's heavily overloaded. For example, it does me no good to know about exponentials when a superscript is used in a different context. Reading any nontrivial math requires knowing what the notation means in the particular context of the work. IME mathematicians typically assume the reader is familiar already and usually don't explain.

Anyone targeting their work at non-experts should explain even what seems like trivial notation to them since they can't know what other meanings the reader may think the notation holds.


But the fact is that math notation is the most widely known notation for writing down equations (let's make things specific to the OP case and say summation equations and partial derivative equations).

Specifically for the superscript case, the vast vast vast majority of the cases, it will be obvious whether the superscript notation means exponentiation or indexing. When there are ambiguities, what the person explaining the equation should do is clarify the ambiguity.

It doesn't do the world any good to make the entire engineering world learn (relatively) obscure programming languages in order to be slightly more clear (and let's not forget, much more verbose) when writing down simple equations, when all one has to do is clarify a couple of ambiguities when writing down equations.

Let me make it very clear and say this: equations are meant to explain things. They are not standalone pieces of code that you can copy and paste into a REPL. They are tools to explain how something works. You should always have accompanying text that explains what all the variables in the equation mean and clarify unclear notations.


I fully agree with your last sentence. Sadly I don't often see that actually happen in practice. And when the notation isn't explained I personally tend to waste a lot of brain cycles trying to decide what the notation means because I'm reading material that I don't already know (of course - if I already knew the material I probably wouldn't be reading it).


Right but the thing is, people should be complaining about equations lacking explanations. Not about math notation in general.


I agree. The superscript example is a good one. In most contexts it refers to the exponent, but in the context of a cost function that minimizes a linear regression (for example), it indicates the index of the set.

Computer languages benefit from the fact that poorly designed syntax can be deprecated (not in all cases, e.g. C++) by introducing new features to the language.

Notation in math never advances in the same way for some reason.


Thats the point I'm making... If you've mastered this "simple" notation, then you've probably already mastered simple neural networks like this, so you are not the target for this tutorial.


Summation and derivatives are used quite broadly outside of neural networks.


I've derived backpropagation by hand many times and diagrams like these often just confuse me more.

For me it depends on whether I'm in a passive or active state of learning.

If I'm sitting down on a Sunday afternoon reading the news, backpropagation is going to make zero sense to me.

But, if I'm actively working on a problem, it's much more useful to realize that this is no different than using gradient descent for linear regression or even minimizing a quadratic.

At that point, it becomes just a mechanical calculation (the fact that the resulting gradient looks more intimidating is irrelevant).

And for me, realizing that it's no different than taking the derivative of a quadratic actually makes it more digestible than these fancy animated tutorials.


It is different though. The derivative of the L2 loss function w.r.t. the linear regression parameters is a "flat" function that is easy to derive by hand. With neutral networks you have deeply nested vector-valued functions. If you write down the chain rule, it suggests that you should compute the Jacobians of all the nested vector valued functions and then multiply them together. This would be computationally expensive.

The key idea of backpropagation is that at each layer, you only ever need the derivatives of the loss function w.r.t. the parameters of the layer, and the Jacobian-vector product with the derivatives of the loss function w.r.t. the layer outputs. You never need to compute the jacobians explicitly and you never need to do those high dimensional matrix-matrix multiplications.

These are not complicated ideas but they involve a combination of software design, calculus, and linear algebra that would probably not be obvious to the average CS undergrad.


Here's a very good explanation of the whole thing, that should be accessible to any average high school student who bothered to take a calculus course: http://neuralnetworksanddeeplearning.com/


no different than using gradient descent for linear regression

If you understand that already, then you are really not the target audience of blog posts like that.


You don't need 'math backround' to understand the notation. This is high school math. Maybe check some basics you have forgotten from Wikipedia.

I'm relatively sure that they teach basics derivatives, functions etc. in high school in every country.


Usually only if the students opts for the advanced maths course/class. (Usually known as AP [advanced placement] in the US.)


In the US it varies by state and even city, but they did not teach calculus in high school in Philadelphia at least. Students could chose whether to take statistics or pre-calculus. Pre-calc was just basically trigonometry.


You learn something new every day.

Here in Finland the students are divided into "long" and "short" math. Both will learn basics of calculus and derivatives at least.


> I'm relatively sure that they teach basics derivatives, functions etc. in high school in every country.

You're wrong, they don't teach derivatives in the HS curriculum in Poland, even when you take advanced math in HS.


A picture of a function, its derivative and the notion, that one function shows the other's tangent's slopes, is all that's needed, that should fit in a high-school curriculum especially in physics.

But you are going to hear it anyway if you are going to study.


I didn't get taught derivatives, functions or summation in the UK at GCSE-level (16 year old).

I believe it was covered at A-level (18 year old) but you could only pick three or four subjects for A-Levels at the time, so you had to be selective about the subjects you picked depending on what you "want to be when you grow up", and what you thought you were good at so that you got good enough grades to go to a uni you liked.


May I ask from where you are if you can't understand what derivatives are? Hell it's high school math


I understand what derivatives are. I had absolutely no clue what they were being used for in this animation.


It is only me who gets frustrated by networks drawn upside-down (i.e. data flowing from down to up? IMHO it is a poor convetion, mindlessly repeated.

In English we read from top to bottom. Data flows (be it equations or flow charts) typically follow the same convention, so we can read articles in in a coherent way. Even trees (both data structures and decision trees), grow from their roots downwards (so, against their original biological metaphor). At least most of researchers write neural networks from left to right, consistent with English.

More of this point: https://www.reddit.com/r/MachineLearning/comments/6j28t9/d_w...


(One could probably make a comedic genre out of the type of comments that get voted to the top of the average hn thread...)



I generally prefer to see neural networks depicted so that the input is on the left and the output is on the right. This is probably because I think of time on an axis as flowing left to right.


Data flowing down seems upside down to me. Don't know why; maybe I just got used to it.


I wonder if it could be from 2D graphs where the starting point (0,0) is always at the bottom left. Or it could be that in the physical world we tend to start things at the bottom and work up (like a building, or stairs). Or maybe Microsoft Windows where you literally start in the bottom of the screen and work your way up.


Except that the first thing I do with my new Windows images is drag that damn thing to the top of the screen so that menus drop -down- like they're supposed to dammit lol


There is also the convention of the pyramid, with greater order as you go up and chaos at the bottom. Not exactly parallel to this network, but I do picture the tip of the pyramid as the output.

Another way that this "upside down" way works for me is that this isn't water flowing downhill, it's being pushed up with every layer of the network adding energy or input.

Finally there's the metaphor of the roots of a plant being under the fruit of the plant.

(I didn't read the reddit post, apologies if it's a duplicate or these examples are addressed there.)


There are probably others who get frustrated, but I don't think most people do.

I think its fine, and I haven't heard others complain about it over the years.


> their original biological metaphor

say root network


forward and back usually refer to right- and leftwards in 2D so it should probably be drawn left-to-right


It's not clear what you mean.

- top->bottom is not compatible with left->right - "back" propagation is "back" for a reason, so it should go against the normal (forward) direction


E.g. in Feynman diagrams time flows forwards from left to right.


Only in cultures whose writing system reads left-to-right. :)


math is generally written left-to-right


While we're at this, I found the explainations by 3Brown1Blue to be very intuitive when it comes to neural networks, especially for folks who're new and don't necessarily grasp concepts when explained primarily through math.

What is backpropagation really doing? https://youtu.be/Ilg3gGewQ5U

His other videos on this topic are just as good.


Unfortunately there is no attribution, but this tool was created by Daniel Smilkov, who also built TensorFlow Playground and who is a cocreator of TensorFlow.js.

https://twitter.com/dsmilkov


Also, going back one on the URL leads you to the Google ML crash course, which I guess this is part of:

https://google-developers.appspot.com/machine-learning/crash...

Definitely worth checking out.


Sorry, this is not "visual explanation", this is "math heavy explanation with some drawings next to it".


Love the animation and the math explanation.

Only one, I hope constructive, criticism: Too many formulas without numbers. It will help the explanation if you include numbers and how the results are calculated. Not everybody is comfortable with the chain rule to distribute the error across the individual weights


Agree. Although I've been putting effort into learning more Calculus (former art student!), Linear Algebra, and the like, a version of this with actual numbers would go a long way.


Nice demonstration, but misses to explain the bias value in the forward propagation step. While quite important, this value is often left out when demonstrating the propagation function. So having it in warrants a short description in my opinion.

It also skips over the bias value in the back propagation step.


The bias is just another input node whose value happens to be constant (granted with full connectivity), right? So the motivating idea/derivation of backpropogation doesn't change.


The bias is often updated/corrected in the back propagation step as well. The purpose of the value is to shift the activation function along the x-axis basically. While the weights define the slope of the activation function.


Great Viz. Although my favorite is still the video of Andrej Karpathy's Stanford lecture explaining it.

He sort of goes through some other implications which, from an intuition massaging pov, is great.

https://www.youtube.com/watch?v=i94OvYb6noo

For those who are impressed by such things -- as I am -- he is now head of AI or ML or something at Tesla.


What a nice website. No weird flashy banners, no external scripts, zoomable, no ads, no tracking.

Just text accompanied by great visualisations.

Kudos!


I found in firefox scrolling down left a lot of thing greyed out, I had to highlight and unhighlight things to get them to show properly.


question about this part:

> f(x) has to be a non-linear function, otherwise the neural network will only be able to learn linear models.

I thought one of the most common functions was relu, which is linear (but cuts off to 0 for x values below 0)

https://en.wikipedia.org/wiki/Rectifier_(neural_networks)


It’s exactly the cutoff at zero that puts the "kink" into the function which makes ReLU nonlinear. :)


Am I the only one who starts seeing some LaTeX creep in halfway through instead of the rendered formulas?


Does anyone know if there is a tool that helps to create these kind of presentations?


This just uses http://scrollerjs.com and https://d3js.org.

Any tool that combines these things is by necessity going to limit your creativity with them. Using the tools themselves is your best bet.


https://idyll-lang.org/ is built for this. You'll still have to write code for your custom graphics but it will help you get things up and running quickly.

Check out https://mathisonian.github.io/idyll/scaffolding-interactives... (scroll example is towards the bottom)


this is so freaking cool.


underwhelming


You don't need to sign your comments on HN.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: