Backpropagation algorithm visual explanation

londons_explore · on June 27, 2018

I find explanations like these are great for people who understand mathematical notation, sigmoid functions, derivitives, etc. These people however typically understand whats going on from a simple text description of the process.

For those without a math background, the notation is very opaque. A far better explanation is to explain it numerically with simple examples.

For example, have two bits of training data:

input -> output

1 -> 0

0 -> 1

And a simple network with zero hidden nodes and train it. By hand...

Then add another bit of training data:

0.5 -> 1.5

Notice that it is now impossible to fit the training data exactly, however many training iterations we do. Now add a hidden layer with one or two nodes. Now we can perfectly fit the data, but show that depending on initialization weights we might never get there through gradient descent. Nows the time to mention different types of optimizers, momentum, etc.

mjfisher · on June 27, 2018

Just to add a slightly different perspective: I'm comfortable with the notation and calculus involved, but had not known how backpropagation worked until now.

I'm not sure if it's the same for others, but I don't find bare text descriptions with formulas particularly useful. Mathematical notation on a page is great for rote application of rules and computation, but by itself does not easily communicate an intuitive understanding of the system the math represents. I have to work very hard to build up mental pictures of systems described by just notation, and those mental pictures often have to move in complicated ways as well.

The relationship between maths on the page and the systems they describe is the same as seeing musical notation on a page and hearing a full orchestra. One is a dry accounting of the facts involved. The other is moving and powerful in its richness and immediacy, a living thing that defies easy communication beyond the experience itself.

Demonstrations like this show you the maths _and_ build up a picture for you at the same time. The result of that is that you can communicate a very powerful idea (e.g. backpropagation) very precisely, intuitively and quickly.

Very much worth a five minute scroll for me; YMMV!

inputcoffee · on June 27, 2018

I half agree with you. The notation alone, although "understandable" doesn't provide a "deep" understanding.

I think the intuition is only half-transmitted with just the notation.

I do think working through an example sort of completes the intuition.

newen · on June 27, 2018

Kind of tired of people in the programming community proudly complaining about not knowing simple math notation. Educate yourself. Everyone else in the engineering world knows these basic math notations.

telchar · on June 27, 2018

Math notation is simple because it's heavily overloaded. For example, it does me no good to know about exponentials when a superscript is used in a different context. Reading any nontrivial math requires knowing what the notation means in the particular context of the work. IME mathematicians typically assume the reader is familiar already and usually don't explain.

Anyone targeting their work at non-experts should explain even what seems like trivial notation to them since they can't know what other meanings the reader may think the notation holds.

newen · on June 27, 2018

But the fact is that math notation is the most widely known notation for writing down equations (let's make things specific to the OP case and say summation equations and partial derivative equations).

Specifically for the superscript case, the vast vast vast majority of the cases, it will be obvious whether the superscript notation means exponentiation or indexing. When there are ambiguities, what the person explaining the equation should do is clarify the ambiguity.

It doesn't do the world any good to make the entire engineering world learn (relatively) obscure programming languages in order to be slightly more clear (and let's not forget, much more verbose) when writing down simple equations, when all one has to do is clarify a couple of ambiguities when writing down equations.

Let me make it very clear and say this: equations are meant to explain things. They are not standalone pieces of code that you can copy and paste into a REPL. They are tools to explain how something works. You should always have accompanying text that explains what all the variables in the equation mean and clarify unclear notations.

telchar · on June 28, 2018

I fully agree with your last sentence. Sadly I don't often see that actually happen in practice. And when the notation isn't explained I personally tend to waste a lot of brain cycles trying to decide what the notation means because I'm reading material that I don't already know (of course - if I already knew the material I probably wouldn't be reading it).

newen · on June 28, 2018

Right but the thing is, people should be complaining about equations lacking explanations. Not about math notation in general.

dmartinez · on June 27, 2018

I agree. The superscript example is a good one. In most contexts it refers to the exponent, but in the context of a cost function that minimizes a linear regression (for example), it indicates the index of the set.

Computer languages benefit from the fact that poorly designed syntax can be deprecated (not in all cases, e.g. C++) by introducing new features to the language.

Notation in math never advances in the same way for some reason.

londons_explore · on June 27, 2018

Thats the point I'm making... If you've mastered this "simple" notation, then you've probably already mastered simple neural networks like this, so you are not the target for this tutorial.

alanbernstein · on June 27, 2018

Summation and derivatives are used quite broadly outside of neural networks.

windsignaling · on June 27, 2018

I've derived backpropagation by hand many times and diagrams like these often just confuse me more.

For me it depends on whether I'm in a passive or active state of learning.

If I'm sitting down on a Sunday afternoon reading the news, backpropagation is going to make zero sense to me.

But, if I'm actively working on a problem, it's much more useful to realize that this is no different than using gradient descent for linear regression or even minimizing a quadratic.

At that point, it becomes just a mechanical calculation (the fact that the resulting gradient looks more intimidating is irrelevant).

And for me, realizing that it's no different than taking the derivative of a quadratic actually makes it more digestible than these fancy animated tutorials.

blt · on June 28, 2018

It is different though. The derivative of the L2 loss function w.r.t. the linear regression parameters is a "flat" function that is easy to derive by hand. With neutral networks you have deeply nested vector-valued functions. If you write down the chain rule, it suggests that you should compute the Jacobians of all the nested vector valued functions and then multiply them together. This would be computationally expensive.

The key idea of backpropagation is that at each layer, you only ever need the derivatives of the loss function w.r.t. the parameters of the layer, and the Jacobian-vector product with the derivatives of the loss function w.r.t. the layer outputs. You never need to compute the jacobians explicitly and you never need to do those high dimensional matrix-matrix multiplications.

These are not complicated ideas but they involve a combination of software design, calculus, and linear algebra that would probably not be obvious to the average CS undergrad.

p1esk · on June 28, 2018

Here's a very good explanation of the whole thing, that should be accessible to any average high school student who bothered to take a calculus course: http://neuralnetworksanddeeplearning.com/

p1esk · on June 27, 2018

no different than using gradient descent for linear regression

If you understand that already, then you are really not the target audience of blog posts like that.

nabla9 · on June 27, 2018

You don't need 'math backround' to understand the notation. This is high school math. Maybe check some basics you have forgotten from Wikipedia.

I'm relatively sure that they teach basics derivatives, functions etc. in high school in every country.

pas · on June 27, 2018

Usually only if the students opts for the advanced maths course/class. (Usually known as AP [advanced placement] in the US.)

swebs · on June 27, 2018

In the US it varies by state and even city, but they did not teach calculus in high school in Philadelphia at least. Students could chose whether to take statistics or pre-calculus. Pre-calc was just basically trigonometry.

nabla9 · on June 27, 2018

You learn something new every day.

Here in Finland the students are divided into "long" and "short" math. Both will learn basics of calculus and derivatives at least.

q3k · on June 27, 2018

> I'm relatively sure that they teach basics derivatives, functions etc. in high school in every country.

You're wrong, they don't teach derivatives in the HS curriculum in Poland, even when you take advanced math in HS.

leeeeech · on June 27, 2018

A picture of a function, its derivative and the notion, that one function shows the other's tangent's slopes, is all that's needed, that should fit in a high-school curriculum especially in physics.

But you are going to hear it anyway if you are going to study.

mattlondon · on June 27, 2018

I didn't get taught derivatives, functions or summation in the UK at GCSE-level (16 year old).

I believe it was covered at A-level (18 year old) but you could only pick three or four subjects for A-Levels at the time, so you had to be selective about the subjects you picked depending on what you "want to be when you grow up", and what you thought you were good at so that you got good enough grades to go to a uni you liked.

openIce · on June 27, 2018

May I ask from where you are if you can't understand what derivatives are? Hell it's high school math

nsxwolf · on June 27, 2018

I understand what derivatives are. I had absolutely no clue what they were being used for in this animation.

stared · on June 27, 2018

It is only me who gets frustrated by networks drawn upside-down (i.e. data flowing from down to up? IMHO it is a poor convetion, mindlessly repeated.

In English we read from top to bottom. Data flows (be it equations or flow charts) typically follow the same convention, so we can read articles in in a coherent way. Even trees (both data structures and decision trees), grow from their roots downwards (so, against their original biological metaphor). At least most of researchers write neural networks from left to right, consistent with English.

More of this point: https://www.reddit.com/r/MachineLearning/comments/6j28t9/d_w...

hn0 · on June 27, 2018

(One could probably make a comedic genre out of the type of comments that get voted to the top of the average hn thread...)

stared · on June 27, 2018

Context? (https://xkcd.com/1085/)

physicsyogi · on June 27, 2018

I generally prefer to see neural networks depicted so that the input is on the left and the output is on the right. This is probably because I think of time on an axis as flowing left to right.

newen · on June 27, 2018

Data flowing down seems upside down to me. Don't know why; maybe I just got used to it.

martin-adams · on June 27, 2018

I wonder if it could be from 2D graphs where the starting point (0,0) is always at the bottom left. Or it could be that in the physical world we tend to start things at the bottom and work up (like a building, or stairs). Or maybe Microsoft Windows where you literally start in the bottom of the screen and work your way up.

zentiggr · on June 27, 2018

Except that the first thing I do with my new Windows images is drag that damn thing to the top of the screen so that menus drop -down- like they're supposed to dammit lol

ballenf · on June 27, 2018

There is also the convention of the pyramid, with greater order as you go up and chaos at the bottom. Not exactly parallel to this network, but I do picture the tip of the pyramid as the output.

Another way that this "upside down" way works for me is that this isn't water flowing downhill, it's being pushed up with every layer of the network adding energy or input.

Finally there's the metaphor of the roots of a plant being under the fruit of the plant.

(I didn't read the reddit post, apologies if it's a duplicate or these examples are addressed there.)

inputcoffee · on June 27, 2018

There are probably others who get frustrated, but I don't think most people do.

I think its fine, and I haven't heard others complain about it over the years.

leeeeech · on June 27, 2018

> their original biological metaphor

say root network

_csoz · on June 27, 2018

forward and back usually refer to right- and leftwards in 2D so it should probably be drawn left-to-right

stared · on June 27, 2018

It's not clear what you mean.

- top->bottom is not compatible with left->right - "back" propagation is "back" for a reason, so it should go against the normal (forward) direction

_csoz · on June 27, 2018

E.g. in Feynman diagrams time flows forwards from left to right.

munificent · on June 27, 2018

Only in cultures whose writing system reads left-to-right. :)

_csoz · on June 28, 2018

math is generally written left-to-right

pastelsky · on June 27, 2018

While we're at this, I found the explainations by 3Brown1Blue to be very intuitive when it comes to neural networks, especially for folks who're new and don't necessarily grasp concepts when explained primarily through math.

What is backpropagation really doing? https://youtu.be/Ilg3gGewQ5U

His other videos on this topic are just as good.

nsthorat · on June 27, 2018

Unfortunately there is no attribution, but this tool was created by Daniel Smilkov, who also built TensorFlow Playground and who is a cocreator of TensorFlow.js.

https://twitter.com/dsmilkov

ehsankia · on June 27, 2018

Also, going back one on the URL leads you to the Google ML crash course, which I guess this is part of:

https://google-developers.appspot.com/machine-learning/crash...

Definitely worth checking out.

teekert · on June 27, 2018

Sorry, this is not "visual explanation", this is "math heavy explanation with some drawings next to it".

jorgeleo · on June 27, 2018

Love the animation and the math explanation.

Only one, I hope constructive, criticism: Too many formulas without numbers. It will help the explanation if you include numbers and how the results are calculated. Not everybody is comfortable with the chain rule to distribute the error across the individual weights

H1Supreme · on June 27, 2018

Agree. Although I've been putting effort into learning more Calculus (former art student!), Linear Algebra, and the like, a version of this with actual numbers would go a long way.

Chilinot · on June 27, 2018

Nice demonstration, but misses to explain the bias value in the forward propagation step. While quite important, this value is often left out when demonstrating the propagation function. So having it in warrants a short description in my opinion.

It also skips over the bias value in the back propagation step.

ironSkillet · on June 27, 2018

The bias is just another input node whose value happens to be constant (granted with full connectivity), right? So the motivating idea/derivation of backpropogation doesn't change.

Chilinot · on June 27, 2018

The bias is often updated/corrected in the back propagation step as well. The purpose of the value is to shift the activation function along the x-axis basically. While the weights define the slope of the activation function.

inputcoffee · on June 27, 2018

Great Viz. Although my favorite is still the video of Andrej Karpathy's Stanford lecture explaining it.

He sort of goes through some other implications which, from an intuition massaging pov, is great.

https://www.youtube.com/watch?v=i94OvYb6noo

For those who are impressed by such things -- as I am -- he is now head of AI or ML or something at Tesla.

apexalpha · on June 27, 2018

What a nice website. No weird flashy banners, no external scripts, zoomable, no ads, no tracking.

Just text accompanied by great visualisations.

Kudos!

keithnz · on June 27, 2018

I found in firefox scrolling down left a lot of thing greyed out, I had to highlight and unhighlight things to get them to show properly.

2bitencryption · on June 27, 2018

question about this part:

> f(x) has to be a non-linear function, otherwise the neural network will only be able to learn linear models.

I thought one of the most common functions was relu, which is linear (but cuts off to 0 for x values below 0)

https://en.wikipedia.org/wiki/Rectifier_(neural_networks)

brchr · on June 27, 2018

It’s exactly the cutoff at zero that puts the "kink" into the function which makes ReLU nonlinear. :)

michaelmior · on June 27, 2018

Am I the only one who starts seeing some LaTeX creep in halfway through instead of the rendered formulas?

sandGorgon · on June 27, 2018

Does anyone know if there is a tool that helps to create these kind of presentations?

pc86 · on June 27, 2018

This just uses http://scrollerjs.com and https://d3js.org.

Any tool that combines these things is by necessity going to limit your creativity with them. Using the tools themselves is your best bet.

mathisonian · on June 28, 2018

https://idyll-lang.org/ is built for this. You'll still have to write code for your custom graphics but it will help you get things up and running quickly.

Check out https://mathisonian.github.io/idyll/scaffolding-interactives... (scroll example is towards the bottom)

jakemor · on June 27, 2018

this is so freaking cool.

mehh · on June 27, 2018

underwhelming

pvg · on June 27, 2018

You don't need to sign your comments on HN.