# Solved – Testing significance of cross-correlated series

I want to prove that, overall, signal B is correlated to signal A. I was thinking of using cross-correlation (in R) to measure this.

Essentially I have two kinds of signals: signal A is a series of single-valued data describing a particular song; signal B is a series of single-valued data for a user. There are many songs and many users per song, but I do not have the same number of users for every song.

For example:

``Signal A (song data), for song 1 0.994 0.986 0.955 0.890 0.795 0.650 ...  Signal A (song data), for song 2 0.763 0.788 0.787 0.908 0.854 0.901 ...  Signal B (user data), for user 1 listening to song 1 75 74.4 73.7 73 72.3 72 ...  Signal B (user data), for user 1 listening to song 2 71 72.3 74.9 73 72.5 72.9  Signal B (user data), for user 2 listening to song 2 60.6 60.2 61 60.7 61 59.3 ...  Etc. ``

The series are obviously truncated for this illustration. Again, there are many songs, and not every user listened to every song.

I am interested in whether I can draw conclusions about how well all song data (signal A) can predict all user response (signal B).

Ideally, I would like to capture the cross-correlation in one number (one test statistic for each song), so that I may easily quantify whether there is an overall correlation between the two signals.
Using ccf (in R) gives me a value for each lag. For example:

``> print(ccf(x,y)) Autocorrelations of series ‘X’, by lag                                  -6     -5     -4     -3     -2     -1      0      1      2      3      4                                                                     -0.242 -0.090  0.057  0.197  0.466  0.699  0.896  0.436  0.221 -0.018 -0.116   ``

(Are these values the cross-correlation coefficients?)
Also, my data are not stationary. Is there any way (another function?) to test whether signals A and B are correlated across users and songs?
One approach would be to average signal B (take the mean user response) for each song, but because there are a different number of users for each song, working with means might be problematic.

So, my main questions again are:

1. If I perform a cross-correlation for one user data/song data pair, how do I test for significance? Will R give me a correlation coefficient at each lag, or does it only tell me which lag is significant (but not provide any test statistic)? If the latter is the case, will I need to adjust one series of data (to account for the lag) before running a normal Pearson's correlation?

2. What test may I use when the data are not stationary?

3. There are a different number of users for each song. For this reason, I can't simply take the average of all users' data for each song (to correlate the mean user data with the song data) – is that correct? Is there a way to test the correlation between signals A and B for each song (across existing users), or must I try to calculate the correlation for each user/song pair individually?

I hope my intent is clear. Thanks for any insight.

Contents