I was reading the vignette of the R package chemometrics
(link). In the second paragraph (right below the first equation) of Page 12, the author writes:
the OD (Orthogonal Distance) is calculated in the original space as the orthogonal distance of an object to the PCA subspace or, in other
words, the distance between the object and its orthogonal projection
on the PCA subspace.
Can someone elaborate upon how exactly the orthogonal distance is calculated (some R code as an illustrative example would be greatly appreciated, if possible!)
Best Answer
Given a $n$ by $p$ matrix $pmb X$ the SVD decomposition of $pmb X$ is:
$$text{svd}((pmb X-bar{x})/sqrt{n-1})=pmb{UDV}'$$
(I will denote $pmb V_k$ the matrix formed of the first $k$ columns of $pmb V$ and $pmb D_k$ the diagonal matrix formed of the first $k$ rows and columns of $pmb D$)
The SVD decomposition divides the total variance of $pmb X$ unto two mutually orthogonal components:
- The variance of the projection of $(pmb X-bar{x})$ on the space spanned by the first $k$ singular vectors of $(pmb X-bar{x})$:
$$sqrt{(pmb X-bar{x})'pmb{V_kD_k^{-1}V_k'}(pmb X-bar{x})}$$
- The variance of the projection of $(pmb X-bar{x})$ on the space orthogonal to the first $k$ singular vectors of $(pmb X-bar{x})$:
$$sqrt{(pmb X-bar{x})'(pmb{I_k}-pmb{V_kV_k'})(pmb X-bar{x})}$$
(where $pmb I_k$ is the rank $k$ identity matrix) which is also equivalent to:
$$||pmb X-bar{x}-(pmb X-bar{x})pmb{V_k^{}V_k'}||$$.
In R
n<-100 p<-20 k<-5 x<-matrix(rnorm(n*p),nc=p) #your data matrix #the orthogonal distances: data_centered<-sweep(x,2,colMeans(x),FUN="-") loadings<-svd(data_centered/sqrt(nrow(data_centered)-1),nu=0)$v[,1:k] orthDist<-data_centered-data_centered%*%loadings%*%t(loadings) orthDist<-sqrt(rowSums(orthDist*orthDist))
you will find a more complete code to compute OD (and SD, the statistical distances on the space spanned by the loading matrix) in the rrcov
package:
library(rrcov) rrcov:::.distances
the function is not documented, but the slot [email protected]
therein are the orthogonal distances.
Similar Posts:
- Solved – Why is a projection matrix of an orthogonal projection symmetric
- Solved – Is “random projection” strictly speaking not a projection
- Solved – What exactly should be called “projection matrix” in the context of PCA
- Solved – factor analysis for given data with help of matlab
- Solved – factor analysis for given data with help of matlab