Solved – Truncated SVD: how to go from [Uk, Sk, Vk’] to low-dimension matrix?

I have a large word-frequency matrix (~6m unique words X ~4k documents) and I'm trying to use truncated singular value decomposition (SVD) to project it onto a matrix with fewer dimensions. I know how to get to $[U_{k}, S_{k}, V_{k}']$, but I don't know what to do then.

Each tutorial I find gives me a different answer. Here, for instance, I'm told to retain only the first $k$ rows of $S_{k}$ and $V_{k}'$ and then multiply $S_{k}V_{k}'$. In the scikit-learn implementation, however, they do it by multiplying $U_{k}S_{k}'$ (formula and code). (The book chapter they mention as source doesn't say anything about that.) In this previous answer they suggest $X'U_{k}$ instead (though they are talking about PCA, so maybe that doesn't apply here). Other answers haven't helped (here, for instance, they say we need to apply truncated SVD to the reduced matrix – which doesn't sound right, we use truncated SVD precisely to get to the reduced matrix, no?).

So, bottom line: how do I go from $[U_{k}, S_{k}, V_{k}']$ to a reduced data matrix? Which of the above formulas is the correct one (and why)?

Just to be clear: I'm not looking to reduce rank only (that I can do by multiplying $U_{k}S_{k}V_{k}'$ but it gives me the exact same dimensions I had before, which doesn't help me); I want to reduce the actual dimensions of the data matrix.

Let's say you have 4×2 X matrix, run SVD and get U,S,V:

  • U is 4×4
  • S is 4×2, where only the top 2×2 is not empty
  • V is 2×2

If you want to truncate to a single largest singular value:

  • remove all columns from U leaving only the first one, now U1 is 4×1
  • remove all columns from S leaving only the first one, now S1 is 1×1
  • remove all columns from V leaving only the first one, now V1 is 2×1

You can restore the X matrix by U1*S1*V1'

The dimensionality is greatly reduced because now you can use 4×1 U matrix instead of original 4×2 X matrix.

UPDATE: you also have neat equality: X*V=U*S, so you could use both X*V1 or U1*S1 as your reduced dimension matrices, or simply U1, which is similar to the score in PCA

Similar Posts:

Rate this post

Leave a Comment