So I am learning Bayesian Optimization and came across expected improvement.
My question is are we searching for the point in the Gaussian Process model whose expected value (determined by mean and confidence) shall be decreased the most if sampled at that point? So is the starting criteria is to take the lowest point in GP and from there determine what is the next point that whose expected value is lowest then any other point in the GP?
How do we intuitively quantify expected improvement distribution $phi$ in the graph attached?
Best Answer
My question is are we searching for the point in the Gaussian Process model whose expected value (determined by mean and confidence) shall be decreased the most if sampled at that point?
No. At any iteration, you've observed some inputs, and one of those inputs ($x^*$) is the current optimum with a function value $f(x^*)$.
In expected improvement, what we want to do is calculate, for every possible input, how much its function value can be expected to improve over our current optimum. This is expressed in your post by the equation:
$$I(x) = max(f^* – Y, 0)$$
I think it's clearer to write this as:
$$I(x) = max(f(x^*) – f(x), 0)$$
In words, this means that the improvement for any input $x$ is how much better lower its function value f(x) is than the current lowest function value found $f(x^*)$. If $f(x)$ is greater than $f(x^*)$, then there's no improvement, so $I(x) = 0$.
Under a GP posterior, $f(x)$ is a random variable, which means that $I(x)$ is also a random variable, and so we want to calculate the expected value of $I(x)$. We do this for every possible $x$, and pick the one that gives the greatest expected improvement. After observing that point, we add its function value to our GP posterior and repeat.
Similar Posts:
- Solved – Expected Improvement formula for Bayesian Optimisation
- Solved – Why does the train data not fall in confidence interval with scikit-learn Gaussian Process
- Solved – How to design a Kernel for Gaussian process that ensure some properties for the function
- Solved – How to increase variance in Gaussian Process regression
- Solved – Is a function describable by a Gaussian process smooth