Solved – Correlation coefficient for data table

I would like to display correlation coefficients in a table (ideally – with p-value).
However, my code produces exactly the same values for each period (so something is obviously wrong). Could you give me any advice:

#first of all, I read my data table from CSV file: imported <- read.table (file="/home/someone/data_for_R.csv", header=TRUE, sep='t', quote='"'', dec=',', fill=FALSE, comment.char="#",  na.strings = "NA", nrows = -1, skip = 0, check.names = TRUE, strip.white = FALSE, blank.lines.skip = TRUE)  # Typing: class(imported[["Period"]]) produces: # [1] "factor"  #Typing: levels(imported[["Period"]]) produces: # [1] "Summer 2010" "Summer 2011" "Winter 2010" "Winter 2011" "Winter 2012"  xx <- imported[c("Period","Data1.MEAN","Data2.MEAN")] result <- by(xx, xx$Period, function(x) {cor(xx$Data1.MEAN, xx$Data2.MEAN)})     result.dataframe <- as.data.frame(as.matrix(result))     result.dataframe$C <- rownames(result) 

EDIT:

Code which reads file from Github:

library(RCurl) x <- getURL("https://raw.githubusercontent.com/kedziorm/testowe/master/data_for_R.csv") imported <- read.csv (text=x, header=TRUE, sep='t', quote='"'', dec=',', fill=FALSE, comment.char="#",  na.strings = "NA", nrows = -1, skip = 0, check.names = TRUE, strip.white = FALSE, blank.lines.skip = TRUE) xx <- imported[c("Period","Data1.MEAN","Data2.MEAN")] result <- by(xx, xx$Period, function(x) {cor(xx$Data1.MEAN, xx$Data2.MEAN)})     result.dataframe <- as.data.frame(as.matrix(result))     result.dataframe$C <- rownames(result) 

EDIT:
This should finally work:

x <- "PeriodtDatetData1.MEANtData1.MEDIANtData2.MEANtData2.MEDIANtData3.MEANtData3.MEDIANnWinter 2010t26-03-2010t0,3580917t0,307479t0,551191t0,612853t0,3476462t0,3996462nWinter 2010t26-04-2010t0,3016958t0,2643808t0,417791t0,393714t0,2811050286t0,3061050286nSummer 2010t03-07-2010t0,1916181t0,1816603t0,390925t0,37385t0,2183438286t0,2923438286nSummer 2010t04-07-2010t0,2548711t0,1738567t0,4349834t0,4957131t0,2467746286t0,3437746286nWinter 2011t01-11-2010t0,3393042t0,2870481t0,497295t0,538132t0,3210420857t0,3690420857nSummer 2011t04-06-2011t0,222748t0,2218226t0,363823t0,275725t0,2309696t0,2809696nSummer 2011t05-06-2011t0,241889t0,1918457t0,373566t0,292997t0,2306573429t0,2966573429nWinter 2012t07-11-2011t0,2264874t0,2601413t0,373048t0,274139t0,2456219143t0,2756219143nWinter 2012t08-11-2011t0,2414665t0,2662565t0,314382t0,279857t0,2348871429t0,2598871429nWinter 2012t09-11-2011t0,2817838t0,2325952t0,376063t0,468148t0,254412t0,287412nWinter 2012t10-11-2011t0,2476841t0,2667485t0,406902t0,476582t0,2632384571t0,3632384571n" imported <- read.csv (text=x, header=TRUE, sep='t', quote='"'', dec=',', fill=FALSE, comment.char="#",  na.strings = "NA", nrows = -1, skip = 0, check.names = TRUE, strip.white = FALSE, blank.lines.skip = TRUE) xx <- imported[c("Period","Data1.MEAN","Data2.MEAN")] result <- by(xx, xx$Period, function(x) {cor(xx$Data1.MEAN, xx$Data2.MEAN)})     result.dataframe <- as.data.frame(as.matrix(result))     result.dataframe$C <- rownames(result) 

Your problem is here:

  function(x) {cor(xx$Data1.MEAN, xx$Data2.MEAN)}) 

The variable passed in is x, not xx. The function only knows xx because it exists in the parent environment. You pass in the subsetted data but then ignore it in favor of the unsubsetted data. That is, because you reference a variable other than the one by passes it, it evaluates the correlation on the whole data every time.

Replace xx with x there and it looks right to me, though the last line might be superfluous depending on what you're trying to achieve.

Similar Posts:

Rate this post

Leave a Comment