Solved – What are “second-order dependencies” and “higher order dependencies” in the data

I am reading A Tutorial on Principal Component Analysis by Shlens, 2014, and it mentions these two notions: "second-order dependencies" and "higher order dependencies". I could not find any clear explanation of them. What do they mean?

The goal of the analysis is to decorrelate the data, or
said in other terms, the goal is to remove second-order dependencies in the data. In the data sets of Figure 6, higher order
dependencies exist between the variables. Therefore, removing second-order dependencies is insufficient at revealing all
structure in the data.

PCA is based on variances and covariances, $mathrm E[x_i x_j]$ (assuming mean-free variables). These are measures of second-order dependencies because the data enter in the form of terms of order 2. After PCA, the principal components have 0 covariance between them, so second-order dependencies have been removed. However, it is still possible that higher-order dependencies exist, e.g. that $mathrm E[x_i x_j x_k] neq 0$ for some $i$, $j$, and $k$. By removing second-order dependencies by applying a linear transform, PCA in a way "reveals" second-order dependencies in the form of that transform, but it does not "reveal" higher-order dependencies.

Similar Posts:

Rate this post

Leave a Comment