Solved – Matrix factorization using stochastic gradient descent

I am trying to approximate a square matrix $A in mathbb{R}^{n times n}$ using the following matrix factorization

$$A approx hat{A} = Z Lambda^{-1}Z^T$$

where $Lambda=text{diag}(Z^T mathbb{1})$ and $Z$ is sparse. I would like to use stochastic gradient descent (SGD) in order to achieve that

$$arg underset{hat{A}}{min} | A – hat{A} |$$

So, basically, I would compute some entries of $A$ (because $A$ is a huge matrix) and use these to factorize the matrix. But I am not very sure about the way to do it.

The basic algorithm goes like this:

  1. Start with randomly initialized values of $Z$ and $Lambda^{-1}$
  2. Go through each value $A_{ij}$
  3. Compute the gradient of $||A_{ij} – hat{A}_{ij}||$
  4. Update the parameters $Z, Z^T$ and $Lambda^{-1}$ in the direction of the gradient using a step size $eta$. It's a hyper-parameter and needs to be learned using cross validation. $Z_i = Z_i – nabla(||A_{ij} – (Z_{ij}^2 times Lambda_{jj} times Z_i)||)$

In step 3 you may want to add regularization to prevent overfitting. Take a look at the pseudo code on Wikipedia.

Similar Posts:

Rate this post

Leave a Comment