Probability estimates are not the same thing as uncertainty. Consider tossing a ...

dwiel · on Jan 15, 2023

It is common for a network to output the distribution, so the output is both the mean and variance instead of just the mean like you pointed out. For example check out variational autoencoders.

datastoat · on Jan 15, 2023

In my example, of predicting a coin toss, the naive output is a probability distribution: it's "Prob(heads)=0.5, Prob(tails)=0.5". This is the distribution that will be produced both by the person who sees 2 heads and 2 tails, and by the person who sees 2000 heads and 2000 tails.

Bayesians use the terms 'aleatoric' and 'epistemic' uncertainty. Aleatoric uncertainty is the part of uncertainty that says "I don't know the outcome, and I wouldn't know it even if I knew the exact model parameters", and epistemic uncertainty says "I don't even know the model".

Your example (outputting a mean and variance) is reporting a probability distribution, and it captures aleatoric uncertainty. When Bayesians talk about uncertainty or confidence, they're referring to model uncertainty -- how confident are you about the mean and the variance that you're reporting?

sdl · on Jan 15, 2023

See e.g. Ian Osband's work (he calls it 'risk' VS 'uncertainty' for some good examples that help in differentiating this: https://scholar.google.com/citations?view_op=view_citation&h...

steppi · on Jan 15, 2023

The variational autoencoder is a Bayesian model. See [0] for instance.

[0] https://jeffreyling.github.io/2018/01/09/vaes-are-bayesian.h...

dwiel · on Jan 15, 2023

Right, the claim was that "Neural networks give probability estimates. Bayesian methods give us probability estimates AND uncertainty" which presents a false dichotomy. I think we agree.

steppi · on Jan 15, 2023

Ah yes, got you. It is a false dichotomy because it neglects that there’s such a thing as Bayesian neural networks. Also, taking ensembles of ordinary neural networks with random initializations approximates Bayesian inference in a sense and this is relatively well known I think.

datastoat · on Jan 15, 2023

Indeed, there are Bayesian neural networks and there are non-Bayesian neural networks, and I shouldn't have implied that all neural networks are non-Bayesian.

I'm just trying to point out that there is a dichotomy between the Bayesian and the non-Bayesian, and that the standard neural network models are non-Bayesian, and that we need Bayesianism (or something like it) to talk about (epistemic) uncertainty.

Standard neural networks are non-Bayesian, because they do not treat the neural network parameters as random variables. This includes most of the examples that have been mentioned in this thread: classifiers (which output a probability distribution over labels), networks that estimate mean and variance, and VAEs (which use Bayes's rule for the latent variable but not for the model parameters). These networks all deal with probability distributions, but that's not enough for us to call them Bayesian.

Bayesian neural networks are easy, in principle -- if we treat the edge weights of a neural network as having a distribution, then the entire neural network is Bayesian. And as you say these can be approximated, e.g. by using dropout at inference time [0], or by careful use of ensemble methods [1].

[0] https://arxiv.org/abs/1506.02142

Quote: "Deep learning tools have gained tremendous attention in applied machine learning. However such tools for regression and classification do not capture model uncertainty."

[1] https://arxiv.org/abs/1810.05546

Quote: "Ensembling NNs provides an easily implementable, scalable method for uncertainty quantification, however, it has been criticised for not being Bayesian."

dwiel · on Jan 15, 2023

Yeah right, in my experience I haven't needed as many networks in the ensemble as I first assumed. This paper [1] suggests 5-10, but in practice I've found only 3 has often been sufficient.

[1] https://arxiv.org/abs/1811.12188

steppi · on Jan 15, 2023

Very cool. I’ve never tried fewer than 5. Will definitely keep that in mind. Yes, I know that paper.

mr_toad · on Jan 16, 2023

> The literature on neural network calibration seems to me to have missed this distinction.

I’d hazard a guess that analytical solutions are intractable and numerical solutions would be infeasible.