Solved – For a random matrix, shouldn’t a SVD explain nothing at all? What am I doing wrong

If I construct a 2-D matrix composed entirely of random data, I would expect the PCA and SVD components to essentially explain nothing.

Instead, it seems like the the first SVD column appears to explain 75% of the data. How can this possibly be? What am I doing wrong?

Here is the plot:

enter image description here

Here is the R code:

set.seed(1) rm(list=ls()) m <- matrix(runif(10000,min=0,max=25), nrow=100,ncol=100) svd1 <- svd(m, LINPACK=T) par(mfrow=c(1,4)) image(t(m)[,nrow(m):1]) plot(svd1$d,cex.lab=2, xlab="SVD Column",ylab="Singluar Value",pch=19)  percentVarianceExplained = svd1$d^2/sum(svd1$d^2) * 100 plot(percentVarianceExplained,ylim=c(0,100),cex.lab=2, xlab="SVD Column",ylab="Percent of variance explained",pch=19)  cumulativeVarianceExplained = cumsum(svd1$d^2/sum(svd1$d^2)) * 100 plot(cumulativeVarianceExplained,ylim=c(0,100),cex.lab=2, xlab="SVD column",ylab="Cumulative percent of variance explained",pch=19) 


Thankyou @Aaron. The fix, as you noted, was to add scaling to the matrix so that the numbers are centered around 0 (i.e. the mean is 0).

m <- scale(m, scale=FALSE) 

Here is the corrected image, showing for a matrix with random data, the first SVD column is close to 0, as expected.

Corrected image

The first PC is explaining that the variables are not centered around zero. Scaling first or centering your random variables around zero will have the result you expect. For example, either of these:

m <- matrix(runif(10000,min=0,max=25), nrow=100,ncol=100) m <- scale(m, scale=FALSE)  m <- matrix(runif(10000,min=-25,max=25), nrow=100,ncol=100) 

Similar Posts:

Rate this post

Leave a Comment