# Solved – Matrix factorization using stochastic gradient descent

I am trying to approximate a square matrix \$A in mathbb{R}^{n times n}\$ using the following matrix factorization

\$\$A approx hat{A} = Z Lambda^{-1}Z^T\$\$

where \$Lambda=text{diag}(Z^T mathbb{1})\$ and \$Z\$ is sparse. I would like to use stochastic gradient descent (SGD) in order to achieve that

\$\$arg underset{hat{A}}{min} | A – hat{A} |\$\$

So, basically, I would compute some entries of \$A\$ (because \$A\$ is a huge matrix) and use these to factorize the matrix. But I am not very sure about the way to do it.

Contents

#### Best Answer

The basic algorithm goes like this:

1. Start with randomly initialized values of \$Z\$ and \$Lambda^{-1}\$
2. Go through each value \$A_{ij}\$
3. Compute the gradient of \$||A_{ij} – hat{A}_{ij}||\$
4. Update the parameters \$Z, Z^T\$ and \$Lambda^{-1}\$ in the direction of the gradient using a step size \$eta\$. It's a hyper-parameter and needs to be learned using cross validation. \$Z_i = Z_i – nabla(||A_{ij} – (Z_{ij}^2 times Lambda_{jj} times Z_i)||)\$

In step 3 you may want to add regularization to prevent overfitting. Take a look at the pseudo code on Wikipedia.

Rate this post