Pretty much a complete newbie with PLS, Python, stats, (and stackexchange), sorry:
When using sklearn for PLSRegression, why is the resulting scores matrix not given by the product of (scaled) input and the weights matrix?
I.e. T = X0 * W
Minimum working example showing this is given below.
I have figured out in the meantime that the scores can be calculated according to an algorithm shown, e.g., here http://www.sciencedirect.com/science/article/pii/0003267086800289?via%3Dihub and that the scores can be calculated via the '.transform' method. But I struggle to understand why this choice was made. Can anyone tell me what's the benefit of having it this way?
Thanks a lot!
Cheers
import numpy as np # PLS tools from sklearn.preprocessing import scale from sklearn.cross_decomposition import PLSRegression # just some numbers X = np.random.multivariate_normal(np.array([3,4,5]),np.diag([5,4,1]),100) y = np.dot(X,np.array([1,2,3]))+np.random.random(size=(100,)) pls = PLSRegression(n_components=2) pls.fit(scale(X),y) (pls.x_scores_ - np.dot(scale(X),pls.x_weights_)) / pls.x_scores_ # differ significantly from second column on forward
Best Answer
You are referring to NIPALS algorithm. In that algorithm, as the paper you referred shows, you deflate $X$ block while building up $Y$ block.
So you don't have a single $W$ matrix that can be applied to $X$ directly, instead the steps for calculation of scores are as following:
start with
$E = X$
for the first component (or latent variable, LV)
$t_1 = E w_1$
$E = E – (t_1p_1')$
for the second component
$t_2 = Ew_2$
$E = E – (t_2p_2')$
and so on…
Where $t_h$ is the $h^{th}$ scores vector, $w_h$ is the $h^{th}$ weights vector and $p_h$ is the $h^{th}$ loading vector of $X$
There is, however, another algorithm called SIMPLS which provides you exactly what you need; a single weights matrix to be applied directly on the $X$. In that manner, I personally find NIPALS to be confusing and SIMPLS to be superior.
TL;DR The reason is the NIPALS algorithm.
Similar Posts:
- Solved – PLS (partial least squares) weights, loadings, and scores interpretations
- Solved – What does the PCA().transform() method do
- Solved – Why does scaling the features affect the prediction of a regression
- Solved – How to find the optimal threshold for the Weighted f1 score in a binary classification problem
- Solved – using the Lasso in sklearn