Solved – PCA, ICA and Laplacian eigenmaps


I am very interested in the Laplacian Eigenmaps method. Currently, I am using it to do dimension reduction on my medical data sets.

However, I have run into a problem using the method.

For example, I have some data (spectra signals), and I can use PCA (or ICA) to get some PCs (or ICs). The problem is how to get similar dimension reduced components of the original data using LE?

According to the Laplacian eigenmaps method, we need to solve the generalised eigenvalue problem, which is

$L y = lambda D y$

Here $y$ is the eigenvector. If I plot the e.g. top 3 eigenvectors (the solution according to 3 eigenvalues), the results are not interpretable.

However, when I plot the top 3 PCs and top 3 ICs, the results always seem to clearly (visually) represent the original data $x$.

I assume the reason is because the matrix $L$ is defined by the weight matrix (Adjacency matrix $W$), and the data $x$ has been fitted with the heat kernel to create $W$, which is using an exponential function. My question is how to retrieve the reduced components of $x$ (not the eigenvector $y$ of matrix $L$)?


My dataset is restricted and not easy to demonstrate the problem. Here I created a toy problem to show what I meant and what I want to ask.

Please see the picture,

Firstly, I create some sine waves A, B, C showing in red curves (first column of the figure). A, B, and C have 1000 samples, in other words, saved in 1×1000 vectors.

Secondly, I mixed the sources A, B, C using randomly created linear combinations, e.g., $M = r_1*A + r_2*B + r_3*C$, in which $r_1, r_2, r_3$ are random values. The mixed signal $M$ is in very high dimensional space, e.g., $M in R^{1517times1000}$, 1517 is randomly chosen high dimensional space. I show only first three rows of signal M in green curves (second column of the figure).

Next, I run PCA, ICA and Laplacian eigenmaps to get the dimension reduction results. I chose to use 3 PCs, 3 ICs, and 3 LEs to do a fair comparison (blue curves showed as 3rd, 4th, and last column of the figure respectively).

From the results of PCA and ICA (3rd, 4th column of the figure), we can see that we can interpret the results as some dimension reduction, i.e., for ICA results, we can recover the mixed signal by $M = b_1*IC1 + b_2*IC2 + b_3*IC3$ (I am not sure if we can also get $M = a_1*PC1 + a_2*PC2 + a_3*PC3$ with PCA results but the result seems quite right for me).

However, please look at the results of LE, I can barely interpret the results (last column of the figure). It seems something 'wrong' with the reduced components. Also, I want to mention that eventually the plot of the last column is the eigenvector $y$ in formula $L y = lambda D y$

Have you people got more ideas?

Figure 1 using 12 nearest neighbours and sigma in the heating kernel is 0.5:
Columns from left to right: original signal, mixed signal, PCs, ICs, LEs

Figure 2 using 1000 nearest neighbours and sigma in the heating kernel is 0.5:
Columns from left to right: original signal, mixed signal, PCs, ICs, LEs

Sourcecode: Matlab code with required package

Best Answer

The answer to your question is given by the mapping at the bottom of Page 6 of the original Laplacian Eigenmaps paper :

$x_i rightarrow (f_1(i), dots, f_m(i))$

So for instance, the embedding of a point $x_5$ in, say, the top 2 "components" is given by $(f_1(5), f_2(5))$ where $f_1$ and $f_2$ are the eigenvectors corresponding to the two smallest non-zero eigenvalues from the generalized eigenvalue problem $L f = lambda D f$.

Note that unlike in PCA, it is not straightforward to obtain an out-of-sample embedding. In other words, you can obtain the embedding of a point that was already considered when computing $L$, but not (easily) if it is a new point. If you're interested in doing the latter, look up this paper.

Similar Posts:

Rate this post

Leave a Comment