# Solved – Calculating partitioned covariance matrices using R

Let's say my data consists of 25 observations of four features, which are in groups of 2. So we'll call my variables $$x_1, x_2, y_1, y_2$$. We have a partitioned sample mean vector given by $$begin{bmatrix}bar{x}\ bar{y}end{bmatrix}$$, where $$bar{x} = begin{bmatrix}bar{x_1}\ bar{x_2}end{bmatrix}$$, $$bar{y} = begin{bmatrix}bar{y_1}\ bar{y_2}end{bmatrix}$$. We can partition the covariance matrix $$S$$ by
$$begin{equation*} S = begin{bmatrix}S_{xx}&S_{yx}\S_{xy}&S_{yy}end{bmatrix} end{equation*}$$
where $$S_{xx}$$ is the covariance matrix of just the $$x$$ variables and $$S_{yy}$$ is the covariance matrix of just the $$y$$ variables. I'm trying to write an R script that will calculate $$S_{yx}$$ using just the input data. Clearly $$S_{yx}$$ consists of
$$begin{equation*} S_{yx} = begin{bmatrix}Cov(x_1, y_1)&Cov(x_1, y_2)\Cov(x_2, y_1)&Cov(x_2, y_2)end{bmatrix} end{equation*}$$
So if $$X$$ is the $$25 times 2$$ matrix of $$x$$ data and $$Y$$ is the $$25 times 2$$ matrix of $$Y$$ data, then my idea for calculating is is like so:
$$begin{equation*} S_{yx} = frac{1}{n-1} big(X^TY – nbar{x}bar{y}^Tbig) end{equation*}$$
However, when I type into R:

``(1/(n-1)) * ((t(X) %*% Y) - (n * xbar %*% t(ybar))) ``

I get results which are wildly different from the corresponding results in the full covariance matrix. Is there a problem with my reasoning or my R code?

EDIT: ftp://ftp.wiley.com/public/sci_tech_med/multivariate_analysis_3e/ My input is the "T3_8_SONS.DAT" file in this directory. I don't know if there is a better way to include my input, sorry.

Contents

You can make your life a lot easier by using R's covariance function `cov`. So, all you need to do is read in the data to a dataframe, change the order of the columns, and then compute the covariance matrix. The partitions can then be plucked from the main covariance matrix. Here is some code to get $$S$$, $$Sxx$$, $$Syx$$.

``> #Read in Data > mydf<-read.table("D:\T3_8_SONS.DAT") > names(mydf)<-c("y1", "y2", "x1", "x2") > #Change column order so S is created in the same order as in OP's question > mydf2<-data.frame(mydf$$x1, mydf$$x2, mydf$$y1, mydf$$y2) > names(mydf2)<-c("x1", "x2", "y1", "y2") > #Print to compare to Table 3.8 in book > head(mydf2)    x1  x2  y1  y2 1 179 145 191 155 2 201 152 195 149 3 185 149 181 148 4 188 149 183 153 5 171 142 176 144 6 192 152 208 157 > #Obtain full variance covariance matrix > S <- cov(mydf2) > #Obtain covariance partitioned matrices > Sxx <- S[1:2,1:2] > Syy <- S[3:4,3:4] > Syx <- Sxy <- S[1:2, 3:4] > S           x1       x2       y1       y2 x1 100.80667 56.54000 69.66167 51.31167 x2  56.54000 45.02333 46.11167 35.05333 y1  69.66167 46.11167 95.29333 52.86833 y2  51.31167 35.05333 52.86833 54.36000 > Sxx          x1       x2 x1 100.8067 56.54000 x2  56.5400 45.02333 > Syy          y1       y2 y1 95.29333 52.86833 y2 52.86833 54.36000 > Syx          y1       y2 x1 69.66167 51.31167 x2 46.11167 35.05333 ``

We could use your method, which is mathematically correct, to obtain $$Syx$$ too, but it's more work. Here are the alternate calculations:

``> #Your method > n<-25 > xbar<-apply(mydf2, 2, mean)[1:2] > ybar<-apply(mydf2, 2, mean)[3:4] > Syx2<-(t(mydf2[1:2])%*%as.matrix(mydf2[3:4])-n*(xbar%*%t(ybar)))/(n-1) > Syx2          y1       y2 x1 69.66167 51.31167 x2 46.11167 35.05333 >  ``

Note that the $$Syx$$ here matches the computations I provided previously for $$Syx$$.

Rate this post