I have some satellite tag time-at-depth (TAD) frequency data that I would like some help with.
The data was transmitted via satellite as percent time spent in each of 7 depth bins (0m, 0-1m, 1-10m, 10-50m etc.), binned over 6-hour intervals. I categorized each row of data corresponding to a date and time into summer vs. winter, and day vs. night, and then summed and averaged the given % for each depth bin. My data looks like this (for one individual, HG03):
HG03.dat Season Time Depth Sum Avrg 1 summ day 0 17.2 0.1702970 2 summ day 1 23.9 0.2366337 3 summ day 10 868.5 8.5990099 4 summ day 50 2698.2 26.7148515 5 summ day 100 419.7 4.1554455 6 summ day 200 266.1 2.6346535 7 summ day 300 1668.6 16.5207921 8 summ day 500 4138.2 40.9722772 9 summ night 0 283.6 5.7877551 10 summ night 1 229.1 4.6755102 11 summ night 10 479.3 9.7816327 12 summ night 50 761.9 15.5489796 13 summ night 100 235.8 4.8122449 14 summ night 200 40.9 0.8346939 15 summ night 300 763.1 15.5734694 16 summ night 500 2106.1 42.9816327 17 wint day 0 0.0 0.0000000 18 wint day 1 0.0 0.0000000 19 wint day 10 0.0 0.0000000 20 wint day 50 0.0 0.0000000 21 wint day 100 7.9 1.1285714 22 wint day 200 92.1 13.1571429 23 wint day 300 0.0 0.0000000 24 wint day 500 600.0 85.7142857 25 wint night 0 43.9 1.7560000 26 wint night 1 0.3 0.0120000 27 wint night 10 0.3 0.0120000 28 wint night 50 0.8 0.0320000 29 wint night 100 10.5 0.4200000 30 wint night 200 51.6 2.0640000 31 wint night 300 411.4 16.4560000 32 wint night 500 1981.2 79.2480000
I wanted to test whether significant differences existed between depth in summer vs. winter, and day vs. night, controlling first for season and then for time of day. I carried out a Cochran-Mantel-Haenszel test, using Average Frequency (Avrg) as the dependent variable (2x2x8 contingency table).
> ct<-xtabs(Avrg~Time+Depth+Season,data=HG03.dat) > mantelhaen.test(ct) Cochran-Mantel-Haenszel test data: ct Cochran-Mantel-Haenszel M^2 = 28.4548, df = 7, p-value = 0.0001818 > ct<-xtabs(Avrg~Season+Depth+Time,data=HG03.dat) > mantelhaen.test(ct) Cochran-Mantel-Haenszel test data: ct Cochran-Mantel-Haenszel M^2 = 111.5986, df = 7, p-value < 2.2e-16
However, I'm not sure if these results are valid, since my raw data is already in frequencies, not in counts. When I used Sum as the dependent variable, I obtained different results.
I am at a loss on how to proceed. If anyone has any ideas, they would be greatly appreciated.
Best Answer
OK. This is my second attempt at an answer to your fascinating question. I scratched the first one.
If I understood the problem correctly, your raw data consist of sets of proportions – you have 7 proportions, representing the distribution of time spent at different depths over a 6 hour period. These vectors of proportions are possibly functions of season, day/night, and maybe other factors.
If so, you have a Dirichlet regression. This is a generalized linear model with Dirichlet "errors" – i.e. the response is a vector of proportions adding to 1. The parameters governing the Dirichlet distribution are functions of your covariates.
A package has recently been added to R-cran that does this. It's called DirichletReg.
It would be a shame to lose sight of the fact that your data are sequential in time. You could fit the Dirichlet regression, allowing for diurnity and seasonality – then plot the residuals against time to see if there are additional time effects. If nothing shows up, then you can ignore the fact that you have a time series.
A Dirichlet model, in your case, would let you model the 7 proportions as a function of 7 parameters. It is not super clear how one would relate the 7 parameters to your covariates. Maier offers 2 different approaches in his package. See here
There is not a lot of work done on the Dirichlet as a model of interest in its own right. It tends to be used in Bayesian statistics as a prior for the estimation of multinomial parameters. Anyway ….
This is bleeding edge stuff. I am sure that Marco Maier, the maintainer of the package, would be happy to hear from you if you have good data that is suitable to his model. His contact information is on r-cran.
By the way, just to be clear, I am not talking about the data presented here. You need to go back to your raw data for the Dirichlet Regression. Ignore all that averaging that you did. Just model the stuff directly, as well as you can.