Solved – Calculating partitioned covariance matrices using R

Let's say my data consists of 25 observations of four features, which are in groups of 2. So we'll call my variables $x_1, x_2, y_1, y_2$. We have a partitioned sample mean vector given by $begin{bmatrix}bar{x}\ bar{y}end{bmatrix}$, where $bar{x} = begin{bmatrix}bar{x_1}\ bar{x_2}end{bmatrix}$, $bar{y} = begin{bmatrix}bar{y_1}\ bar{y_2}end{bmatrix}$. We can partition the covariance matrix $S$ by
begin{equation*}
S = begin{bmatrix}S_{xx}&S_{yx}\S_{xy}&S_{yy}end{bmatrix}
end{equation*}

where $S_{xx}$ is the covariance matrix of just the $x$ variables and $S_{yy}$ is the covariance matrix of just the $y$ variables. I'm trying to write an R script that will calculate $S_{yx}$ using just the input data. Clearly $S_{yx}$ consists of
begin{equation*}
S_{yx} = begin{bmatrix}Cov(x_1, y_1)&Cov(x_1, y_2)\Cov(x_2, y_1)&Cov(x_2, y_2)end{bmatrix}
end{equation*}

So if $X$ is the $25 times 2$ matrix of $x$ data and $Y$ is the $25 times 2$ matrix of $Y$ data, then my idea for calculating is is like so:
begin{equation*}
S_{yx} = frac{1}{n-1} big(X^TY – nbar{x}bar{y}^Tbig)
end{equation*}

However, when I type into R:

(1/(n-1)) * ((t(X) %*% Y) - (n * xbar %*% t(ybar))) 

I get results which are wildly different from the corresponding results in the full covariance matrix. Is there a problem with my reasoning or my R code?

EDIT: ftp://ftp.wiley.com/public/sci_tech_med/multivariate_analysis_3e/ My input is the "T3_8_SONS.DAT" file in this directory. I don't know if there is a better way to include my input, sorry.

You can make your life a lot easier by using R's covariance function cov. So, all you need to do is read in the data to a dataframe, change the order of the columns, and then compute the covariance matrix. The partitions can then be plucked from the main covariance matrix. Here is some code to get $S$, $Sxx$, $Syx$.

> #Read in Data > mydf<-read.table("D:\T3_8_SONS.DAT") > names(mydf)<-c("y1", "y2", "x1", "x2") > #Change column order so S is created in the same order as in OP's question > mydf2<-data.frame(mydf$x1, mydf$x2, mydf$y1, mydf$y2) > names(mydf2)<-c("x1", "x2", "y1", "y2") > #Print to compare to Table 3.8 in book > head(mydf2)    x1  x2  y1  y2 1 179 145 191 155 2 201 152 195 149 3 185 149 181 148 4 188 149 183 153 5 171 142 176 144 6 192 152 208 157 > #Obtain full variance covariance matrix > S <- cov(mydf2) > #Obtain covariance partitioned matrices > Sxx <- S[1:2,1:2] > Syy <- S[3:4,3:4] > Syx <- Sxy <- S[1:2, 3:4] > S           x1       x2       y1       y2 x1 100.80667 56.54000 69.66167 51.31167 x2  56.54000 45.02333 46.11167 35.05333 y1  69.66167 46.11167 95.29333 52.86833 y2  51.31167 35.05333 52.86833 54.36000 > Sxx          x1       x2 x1 100.8067 56.54000 x2  56.5400 45.02333 > Syy          y1       y2 y1 95.29333 52.86833 y2 52.86833 54.36000 > Syx          y1       y2 x1 69.66167 51.31167 x2 46.11167 35.05333 

We could use your method, which is mathematically correct, to obtain $Syx$ too, but it's more work. Here are the alternate calculations:

> #Your method > n<-25 > xbar<-apply(mydf2, 2, mean)[1:2] > ybar<-apply(mydf2, 2, mean)[3:4] > Syx2<-(t(mydf2[1:2])%*%as.matrix(mydf2[3:4])-n*(xbar%*%t(ybar)))/(n-1) > Syx2          y1       y2 x1 69.66167 51.31167 x2 46.11167 35.05333 >  

Note that the $Syx$ here matches the computations I provided previously for $Syx$.

Similar Posts:

Rate this post

Leave a Comment