Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Probability estimates are not the same thing as uncertainty.

Consider tossing a coin. If I see 2 heads and 2 tails, I might report "the probability of heads is 50%". If you see 2000 heads and 2000 tails you'd also report the SAME probability estimate -- but you'd be more certain than me.

Neural networks give probability estimates. Bayesian methods (and also frequentist methods) give us probability estimates AND uncertainty.

The literature on neural network calibration seems to me to have missed this distinction.



It is common for a network to output the distribution, so the output is both the mean and variance instead of just the mean like you pointed out. For example check out variational autoencoders.


In my example, of predicting a coin toss, the naive output is a probability distribution: it's "Prob(heads)=0.5, Prob(tails)=0.5". This is the distribution that will be produced both by the person who sees 2 heads and 2 tails, and by the person who sees 2000 heads and 2000 tails.

Bayesians use the terms 'aleatoric' and 'epistemic' uncertainty. Aleatoric uncertainty is the part of uncertainty that says "I don't know the outcome, and I wouldn't know it even if I knew the exact model parameters", and epistemic uncertainty says "I don't even know the model".

Your example (outputting a mean and variance) is reporting a probability distribution, and it captures aleatoric uncertainty. When Bayesians talk about uncertainty or confidence, they're referring to model uncertainty -- how confident are you about the mean and the variance that you're reporting?


See e.g. Ian Osband's work (he calls it 'risk' VS 'uncertainty' for some good examples that help in differentiating this: https://scholar.google.com/citations?view_op=view_citation&h...


The variational autoencoder is a Bayesian model. See [0] for instance.

[0] https://jeffreyling.github.io/2018/01/09/vaes-are-bayesian.h...


Right, the claim was that "Neural networks give probability estimates. Bayesian methods give us probability estimates AND uncertainty" which presents a false dichotomy. I think we agree.


Ah yes, got you. It is a false dichotomy because it neglects that there’s such a thing as Bayesian neural networks. Also, taking ensembles of ordinary neural networks with random initializations approximates Bayesian inference in a sense and this is relatively well known I think.


Indeed, there are Bayesian neural networks and there are non-Bayesian neural networks, and I shouldn't have implied that all neural networks are non-Bayesian.

I'm just trying to point out that there is a dichotomy between the Bayesian and the non-Bayesian, and that the standard neural network models are non-Bayesian, and that we need Bayesianism (or something like it) to talk about (epistemic) uncertainty.

Standard neural networks are non-Bayesian, because they do not treat the neural network parameters as random variables. This includes most of the examples that have been mentioned in this thread: classifiers (which output a probability distribution over labels), networks that estimate mean and variance, and VAEs (which use Bayes's rule for the latent variable but not for the model parameters). These networks all deal with probability distributions, but that's not enough for us to call them Bayesian.

Bayesian neural networks are easy, in principle -- if we treat the edge weights of a neural network as having a distribution, then the entire neural network is Bayesian. And as you say these can be approximated, e.g. by using dropout at inference time [0], or by careful use of ensemble methods [1].

[0] https://arxiv.org/abs/1506.02142

Quote: "Deep learning tools have gained tremendous attention in applied machine learning. However such tools for regression and classification do not capture model uncertainty."

[1] https://arxiv.org/abs/1810.05546

Quote: "Ensembling NNs provides an easily implementable, scalable method for uncertainty quantification, however, it has been criticised for not being Bayesian."


Yeah right, in my experience I haven't needed as many networks in the ensemble as I first assumed. This paper [1] suggests 5-10, but in practice I've found only 3 has often been sufficient.

[1] https://arxiv.org/abs/1811.12188


Very cool. I’ve never tried fewer than 5. Will definitely keep that in mind. Yes, I know that paper.


> The literature on neural network calibration seems to me to have missed this distinction.

I’d hazard a guess that analytical solutions are intractable and numerical solutions would be infeasible.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: