This is to extend the discussion of the derivative of the GP. The formulation provided in the previous post describes the gradient of GP as derivative of kernel function as follows with respect to $(x^*,x)$: $$K'(x^*, x)=frac{partial K}{partial x^* partial x}(x^*, x)$$

However, the kernel derivative as implemented in Sklearn

K_gradient array (opt.), shape (n_samples_X, n_samples_X, n_dims)

The gradient of the kernel k(X, X) with respect to the hyperparameter of the kernel. Only returned when eval_gradient is True.

Which in my opinion is:

$$K'(x^*, x)=frac{partial K}{partial theta }(x^*, x)$$

Are these two essentially different things or the same? I am looking for the derivative of the GP function at some evaluation point $x^*$.

**Contents**hide

#### Best Answer

The two objects are fundamentally different things. One extreme case to illustrate this difference is given by the kernel on $mathbb R$ $$K(x, x') = x x' + theta^2.$$ Samples $f sim mathcal{GP}(0, K)$ will be linear functions, with $f(0) sim mathcal N(0, theta^2)$ and slope begin{align} f(1) – f(0) &= begin{bmatrix}-1 & 1end{bmatrix} mathcal Nleft( begin{bmatrix}0 \ 0end{bmatrix}, begin{bmatrix} theta^2 & theta^2 \ theta^2 & 1 + theta^2end{bmatrix} right) \&= mathcal Nleft( begin{bmatrix}-1 & 1end{bmatrix} begin{bmatrix}0 \ 0end{bmatrix}, begin{bmatrix}-1 & 1end{bmatrix} begin{bmatrix} theta^2 & theta^2 \ theta^2 & 1 + theta^2end{bmatrix} begin{bmatrix}-1 \ 1end{bmatrix} right) \&= mathcal Nleft( 0, begin{bmatrix}-1 & 1end{bmatrix} begin{bmatrix} 0 \ 1 end{bmatrix} right) \&= mathcal Nleft( 0, 1 right) .end{align}

The previous post you link to discusses the random function $f'$; for this choice of kernel, $f'$ will be simply a constant function equal to the slope, which is standard normal (and totally independent of $theta$, for this kernel).

What scikit-learn computes is, in this case, $$frac{partial K}{partial theta} = 2 theta.$$ This is quite useful in, e.g., finding the kernel parameters which maximize the likelihood of some dataset. But in this case, it's not at all related to what you seem to want, "the derivative of the GP function at some evaluation function"; I don't think scikit-learn directly implements that.

You might be interested instead in GPflow or gpytorch. Both are modern, full-featured, actively-developed GP implementations in TensorFlow / PyTorch respectively; either should I think make it straightforward to find the derivative you're looking for.