Solved – Is the Cochran-Mantel-Haenszel test the correct test for these frequency data

I have some satellite tag time-at-depth (TAD) frequency data that I would like some help with.

The data was transmitted via satellite as percent time spent in each of 7 depth bins (0m, 0-1m, 1-10m, 10-50m etc.), binned over 6-hour intervals. I categorized each row of data corresponding to a date and time into summer vs. winter, and day vs. night, and then summed and averaged the given % for each depth bin. My data looks like this (for one individual, HG03):

HG03.dat    Season  Time Depth    Sum       Avrg 1    summ   day     0   17.2  0.1702970 2    summ   day     1   23.9  0.2366337 3    summ   day    10  868.5  8.5990099 4    summ   day    50 2698.2 26.7148515 5    summ   day   100  419.7  4.1554455 6    summ   day   200  266.1  2.6346535 7    summ   day   300 1668.6 16.5207921 8    summ   day   500 4138.2 40.9722772 9    summ night     0  283.6  5.7877551 10   summ night     1  229.1  4.6755102 11   summ night    10  479.3  9.7816327 12   summ night    50  761.9 15.5489796 13   summ night   100  235.8  4.8122449 14   summ night   200   40.9  0.8346939 15   summ night   300  763.1 15.5734694 16   summ night   500 2106.1 42.9816327 17   wint   day     0    0.0  0.0000000 18   wint   day     1    0.0  0.0000000 19   wint   day    10    0.0  0.0000000 20   wint   day    50    0.0  0.0000000 21   wint   day   100    7.9  1.1285714 22   wint   day   200   92.1 13.1571429 23   wint   day   300    0.0  0.0000000 24   wint   day   500  600.0 85.7142857 25   wint night     0   43.9  1.7560000 26   wint night     1    0.3  0.0120000 27   wint night    10    0.3  0.0120000 28   wint night    50    0.8  0.0320000 29   wint night   100   10.5  0.4200000 30   wint night   200   51.6  2.0640000 31   wint night   300  411.4 16.4560000 32   wint night   500 1981.2 79.2480000 

I wanted to test whether significant differences existed between depth in summer vs. winter, and day vs. night, controlling first for season and then for time of day. I carried out a Cochran-Mantel-Haenszel test, using Average Frequency (Avrg) as the dependent variable (2x2x8 contingency table).

> ct<-xtabs(Avrg~Time+Depth+Season,data=HG03.dat) > mantelhaen.test(ct)          Cochran-Mantel-Haenszel test  data:  ct  Cochran-Mantel-Haenszel M^2 = 28.4548, df = 7, p-value = 0.0001818  > ct<-xtabs(Avrg~Season+Depth+Time,data=HG03.dat) > mantelhaen.test(ct)          Cochran-Mantel-Haenszel test  data:  ct  Cochran-Mantel-Haenszel M^2 = 111.5986, df = 7, p-value < 2.2e-16 

However, I'm not sure if these results are valid, since my raw data is already in frequencies, not in counts. When I used Sum as the dependent variable, I obtained different results.

I am at a loss on how to proceed. If anyone has any ideas, they would be greatly appreciated.

OK. This is my second attempt at an answer to your fascinating question. I scratched the first one.

If I understood the problem correctly, your raw data consist of sets of proportions – you have 7 proportions, representing the distribution of time spent at different depths over a 6 hour period. These vectors of proportions are possibly functions of season, day/night, and maybe other factors.

If so, you have a Dirichlet regression. This is a generalized linear model with Dirichlet "errors" – i.e. the response is a vector of proportions adding to 1. The parameters governing the Dirichlet distribution are functions of your covariates.

A package has recently been added to R-cran that does this. It's called DirichletReg.

It would be a shame to lose sight of the fact that your data are sequential in time. You could fit the Dirichlet regression, allowing for diurnity and seasonality – then plot the residuals against time to see if there are additional time effects. If nothing shows up, then you can ignore the fact that you have a time series.

A Dirichlet model, in your case, would let you model the 7 proportions as a function of 7 parameters. It is not super clear how one would relate the 7 parameters to your covariates. Maier offers 2 different approaches in his package. See here

There is not a lot of work done on the Dirichlet as a model of interest in its own right. It tends to be used in Bayesian statistics as a prior for the estimation of multinomial parameters. Anyway ….

This is bleeding edge stuff. I am sure that Marco Maier, the maintainer of the package, would be happy to hear from you if you have good data that is suitable to his model. His contact information is on r-cran.

By the way, just to be clear, I am not talking about the data presented here. You need to go back to your raw data for the Dirichlet Regression. Ignore all that averaging that you did. Just model the stuff directly, as well as you can.

Similar Posts:

Rate this post

Leave a Comment