If Hessians are so good for optimization (see e.g. Newton's method), why stop there? Let's use the third, fourth, fifth, and sixth derivatives? Why not?
Best Answer
I am interpreting the question as being "Why does Newton's method only use first and second derivatives, not third or higher derivatives?"
Actually, in many cases, going to the third derivative does help; I've done it with custom stuff before. However, in general, going to higher derivatives adds computational complexity – you have to find and calculate all those derivatives, and for multivariate problems, there are a lot more third derivatives than there are first derivatives! – that far outweighs the savings in step count you get, if any. For example, if I have a 3-dimensional problem, I have 3 first derivatives, 6 second derivatives, and 10 third derivatives, so going to a third-order version more than doubles the number of evaluations I have to do (from 9 to 19), not to mention increased complexity of calculating the step direction / size once I've done those evaluations, but will almost certainly not cut the number of steps I have to take in half.
Now, in the general case with $k$ variables, the collection of $n^{th}$ partial derivatives will number${k+n-1} choose {k-1}$, so for a problem with five variables, the total number of third, fourth, and fifth partial derivatives will equal 231, a more than 10-fold increase over the number of first and second partial derivatives (20). You would have to have a problem that is very, very close to a fifth-order polynomial in the variables to see a large enough reduction in iteration counts to make up for that extra computational burden.
Similar Posts:
- Solved – How do Newton-Raphson updates work in Gradient Boosting
- Solved – How do Newton-Raphson updates work in Gradient Boosting
- Solved – Log – likelihood function, why does the summation sign vanish
- Solved – What are the 2nd derivatives of the log multivariate normal density
- Solved – Logistic Regression with weighted instances