Solved – How is “Orthogonal distance” computed

I was reading the vignette of the R package chemometrics (link). In the second paragraph (right below the first equation) of Page 12, the author writes:

the OD (Orthogonal Distance) is calculated in the original space as the orthogonal distance of an object to the PCA subspace or, in other
words, the distance between the object and its orthogonal projection
on the PCA subspace.

Can someone elaborate upon how exactly the orthogonal distance is calculated (some R code as an illustrative example would be greatly appreciated, if possible!)

Given a $n$ by $p$ matrix $pmb X$ the SVD decomposition of $pmb X$ is:

$$text{svd}((pmb X-bar{x})/sqrt{n-1})=pmb{UDV}'$$

(I will denote $pmb V_k$ the matrix formed of the first $k$ columns of $pmb V$ and $pmb D_k$ the diagonal matrix formed of the first $k$ rows and columns of $pmb D$)

The SVD decomposition divides the total variance of $pmb X$ unto two mutually orthogonal components:

  • The variance of the projection of $(pmb X-bar{x})$ on the space spanned by the first $k$ singular vectors of $(pmb X-bar{x})$:

$$sqrt{(pmb X-bar{x})'pmb{V_kD_k^{-1}V_k'}(pmb X-bar{x})}$$

  • The variance of the projection of $(pmb X-bar{x})$ on the space orthogonal to the first $k$ singular vectors of $(pmb X-bar{x})$:

$$sqrt{(pmb X-bar{x})'(pmb{I_k}-pmb{V_kV_k'})(pmb X-bar{x})}$$

(where $pmb I_k$ is the rank $k$ identity matrix) which is also equivalent to:

$$||pmb X-bar{x}-(pmb X-bar{x})pmb{V_k^{}V_k'}||$$.

In R

n<-100 p<-20 k<-5 x<-matrix(rnorm(n*p),nc=p) #your data matrix  #the orthogonal distances:  data_centered<-sweep(x,2,colMeans(x),FUN="-") loadings<-svd(data_centered/sqrt(nrow(data_centered)-1),nu=0)$v[,1:k] orthDist<-data_centered-data_centered%*%loadings%*%t(loadings) orthDist<-sqrt(rowSums(orthDist*orthDist)) 

you will find a more complete code to compute OD (and SD, the statistical distances on the space spanned by the loading matrix) in the rrcov package:

library(rrcov) rrcov:::.distances 

the function is not documented, but the slot [email protected] therein are the orthogonal distances.

Similar Posts:

Rate this post

Leave a Comment