In partial least squares regression, what is the difference between the regression coefficients and the loadings for each independent variable in each component? Specifically, I understand in evety component, each of the independent variables has a coresponding loading. Does each variable also have a regression coefficient? What is the relationship between the loading vector and the coefficients?

Contents

Assuming your independent variable matrix is $$mtimes n$$, that you have $$m$$ observations and $$n$$ variables.

For each PLS component (AKA latent variable), you get a loading vector ($$n times 1$$), so for $$h$$ components the size of loading matrix ($$P$$) is $$n times h$$. These loadings are calculated for both interpretation and algorithmic purposes but they have no use for prediction.

On the other hand, SIMPLS algorithm (I believe the most popular PLS flavor) also involves calculation of weight matrix ($$W$$), which has the same size as loading matrix. This orthogonal matrix $$W$$ is used to calculate $$X$$ scores ($$T$$):

$$T = Xcdot W$$

which is then multiplied by $$Y$$ loadings ($$Q$$) for prediction:

$$hat{Y} = T cdot Q'$$

Therefore, the regression coefficients ($$hat{B}$$ that is $$ntimes1$$ for a single dependent variable) that can be used to predict $$Y$$ directly from $$X$$ can be calculated:

$$hat{B} = W cdot Q'$$

All in all, one obtains a loading vector for each component whereas for different number components a same sized yet different regression coefficients are produced.

As far as I know, a similar logic applies to other PLS algorithms too.

Rate this post