I have inputs $x_1ldots x_n$ that have known $1sigma$ uncertainties $epsilon_1 ldots epsilon_n$. I am using them to predict outputs $y_1 ldots y_m$ on a trained neural network. How can I obtain 1$sigma$ uncertainties on my predictions?
My idea is to randomly perturb each input $x_i$ with normal noise having mean 0 and standard deviation $epsilon_i$ a large number of times (say, 10000), and then take the median and standard deviation of each prediction $y_i$. Does this work?
I fear that this only takes into account the "random" error (from the measurements) and not the "systematic" error (from the network), i.e., each prediction inherently has some error to it that is not being considered in this approach. How can I properly obtain $1sigma$ error bars on my predictions?
Best Answer
$newcommand{bx}{mathbf{x}}$ $newcommand{by}{mathbf{y}}$
I personally prefer the Monte Carlo approach because of its ease. There are alternatives (e.g. the unscented transform), but these are certainly biased.
Let me formalise your problem a bit. You are using a neural network to implement a conditional probability distribution over the outputs $by$ given the inputs $bx$, where the weights are collected in $theta$:
$$ p_theta(by~mid~bx). $$
Let us not care about how you obtained the weights $theta$–probably some kind of backprop–and just treat that as a black box that has been handed to us.
As an additional property of your problem, you assume that your only have access to some "noisy version" $tilde bx$ of the actual input $bx$, where $$tilde bx = bx + epsilon$$ with $epsilon$ following some distribution, e.g. Gaussian. Note that you then can write $$ p(tilde bxmidbx) = mathcal{N}(tilde bx| bx, sigma^2_epsilon) $$ where $epsilon sim mathcal{N}(0, sigma^2_epsilon).$ Then what you want is the distribution $$ p(bymidtilde bx) = int p(bymidbx) p(bxmidtilde bx) dbx, $$ i.e. the distribution over outputs given the noisy input and a model of clean inputs to outputs.
Now, if you can invert $p(tilde bxmidbx)$ to obtain $p(bxmidtilde bx)$ (which you can in the case of a Gaussian random variable and others), you can approximate the above with plain Monte Carlo integration through sampling:
$$ p(bymidtilde bx) approx sum_i p(bymidbx_i), quad bx_i sim p(bxmidtilde bx). $$
Note that this can also be used to calculate all other kinds of expectations of functions $f$ of $by$:
$$ f(tilde bx) approx sum_i f(by_i), quad bx_i sim p(bxmidtilde bx), by_i sim p(bymidbx_i). $$
Without further assumptions, there are only biased approximations.
Similar Posts:
- Solved – Interpreting the output of a neural network
- Solved – Interpreting the output of a neural network
- Solved – How to choose the regularization parameter in ZCA whitening
- Solved – How to test predictive power of GARCH model
- Solved – How tomplement a Neural Network for **Multidimensional** Input Classification