I am trying to approximate a square matrix $A in mathbb{R}^{n times n}$ using the following matrix factorization

$$A approx hat{A} = Z Lambda^{-1}Z^T$$

where $Lambda=text{diag}(Z^T mathbb{1})$ and $Z$ is sparse. I would like to use **stochastic gradient descent** (SGD) in order to achieve that

$$arg underset{hat{A}}{min} | A – hat{A} |$$

So, basically, I would compute some entries of $A$ (because $A$ is a huge matrix) and use these to factorize the matrix. But I am not very sure about the way to do it.

**Contents**hide

#### Best Answer

The basic algorithm goes like this:

- Start with randomly initialized values of $Z$ and $Lambda^{-1}$
- Go through each value $A_{ij}$
- Compute the gradient of $||A_{ij} – hat{A}_{ij}||$
- Update the parameters $Z, Z^T$ and $Lambda^{-1}$ in the direction of the gradient using a step size $eta$. It's a hyper-parameter and needs to be learned using cross validation. $Z_i = Z_i – nabla(||A_{ij} – (Z_{ij}^2 times Lambda_{jj} times Z_i)||)$

In step 3 you may want to add regularization to prevent overfitting. Take a look at the pseudo code on Wikipedia.

### Similar Posts:

- Solved – Expected value of a product of two compound Poisson processes
- Solved – Smallest Kullback-Leibler divergence
- Solved – Smallest Kullback-Leibler divergence
- Solved – Taylor approximation of expected value of multivariate function
- Solved – Taylor approximation of expected value of multivariate function