I need to generate random non-square matrices with $R$ rows and $C$ columns, elements randomly distributed with zero mean, and constrained such that the length ($L_2$ norm) of each row is $1$ and the length of each column is $sqrt{frac{R}{C}}$. Equivalently, the sum of square values is $1$ for each row and $frac{R}{C}$ for each column.
So far I have found one way to achieve this: simply initialize the matrix elements randomly (e.g. from a uniform, normal, or laplace distribution with zero mean and arbitrary variance), then alternately normalize rows and columns to length $1$, ending with row normalization. This seems to converge to the desired result fairly quickly (e.g. for $R=40$ and $C=80$, variance of column length is typically ~ $~0.00001$ after $2$ iterations), but I'm not sure if I can depend on this fast convergence rate in general (for various matrix dimensions and initial element distributions).
My question is this: is there a way to achieve the desired result (row lengths $1$, column lengths $sqrt{frac{R}{C}}$) directly without iterating between row/column normalization? E.g. something like the algorithm for normalizing a random vector (initialize elements randomly, measure sum of square values, then scale each element by a common scalar). If not, is there a simple characterization for the convergence rate (e.g. num iterations until error $< epsilon$) of the iterative method described above?
Best Answer
As @cardinal said in a comment:
Actually, after a little thought, I think you algorithm is exactly the Sinkhorn-Knopp algorithm with a very minor modification. Let $X$ be your original matrix and let $Y$ be a matrix of the same size such that $Y_{ij}=X^2_{ij}$. Then, your algorithm is equivalent to applying Sinkhorn-Knopp to $Y$, where at the final step you recover your desired form by taking $hat{X}_{ij}=sgn(X_{ij})sqrt{Y_{ij}}$. Sinkhorn-Knopp is guaranteed to converge except in quite pathological circumstances. Reading up on it should be very helpful.
…it seems that the iterative algorithm I suggested in the original question is very similar to the Sinkhorn-Knopp algorithm. Interestingly, it also seems very similar to iterative proportional fitting (IPF), which, as described on the IPF wikipedia page, is related to Newton's method and expectation maximization (all have the same limit).
These iterative methods are often applied to problems which lack a closed form solution, so I will tentatively assume that the answer to the question is negative: there is no way to achieve the desired solution without row/column iteration.
Similar Posts:
- Solved – Generating random numbers based on ‘rule-of-thumb’ proportions
- Solved – How to use equalities as constrains with constrOptim in R
- Solved – How to express a correlation matrix in terms of a covariance matrix
- Solved – Relation between R2 and the covariate correlation matrix (multidimensional case)
- Solved – why the diagonal elements of the inverted correlation matrix is related to correlation with all other variables