I am attempting to model sightings of a species at either the east or west of an island to see what environmental variables such as SST, Chl-a etc may be influencing an obvious migration pattern that they display between east and west. Species presence is recorded as a percentage calculated from the monthly total sightings at either the east or west location. It has to be done this way as the data was collected via citizen science so the number of animals isn't reliable.
Below is the lm()
model for which I had to transform my data to ensure I met all assumptions.
transformed<-abs(dat$y - mean(dat$y)) mod1 <- lm(transformed~x, data=dat) gvlma::gvlma(mod1) mod1 ASSESSMENT OF THE LINEAR MODEL ASSUMPTIONS USING THE GLOBAL TEST ON 4 DEGREES-OF-FREEDOM: Level of Significance = 0.05 Call: gvlma::gvlma(x = mod1) Value p-value Decision Global Stat 2.241814 0.6914 Assumptions acceptable. Skewness 0.009048 0.9242 Assumptions acceptable. Kurtosis 0.253950 0.6143 Assumptions acceptable. Link Function 1.120267 0.2899 Assumptions acceptable. Heteroscedasticity 0.858549 0.3541 Assumptions acceptable.
I found that mod1 had an autocorrelation issue so I added a lag1 using the slide() function in the DataCombine package.
mod2<-lm(x~y, data=dat) library(DataCombine) data <- data.frame(mp2017.dat, resid_mod1=mod2$residuals) data_1 <- slide(econ_data, Var="resid_mod1", NewVar = "lag1", slideBy = -1) data_2 <- na.omit(data_1) transformed<-abs(data_2$x - mean(data_2$x)) mod3 <- lm(transformed ~ y + lag1, data=data_2)
This rectified the autocorrelation issue but now the model does not satisfy all assumptions!
gvlma::gvlma(mod3) ASSESSMENT OF THE LINEAR MODEL ASSUMPTIONS USING THE GLOBAL TEST ON 4 DEGREES-OF-FREEDOM: Level of Significance = 0.05 Call: gvlma::gvlma(x = mod3) Value p-value Decision Global Stat 22.23782 1.797e-04 Assumptions NOT satisfied! Skewness 2.58773 1.077e-01 Assumptions acceptable. Kurtosis 0.05036 8.224e-01 Assumptions acceptable. Link Function 19.04845 1.274e-05 Assumptions NOT satisfied! Heteroscedasticity 0.55128 4.578e-01 Assumptions acceptable.
I have been going round in circles for days with this problem! Please, could someone tell me if I can use either model or, if not, how can I rectify one problem without causing another?
Best Answer
I have now managed to overcome this issue using the following lm()
which satisfies all assumptions.
mod <- lm(abs(y-mean(y)) ~ x, data=mp2017.dat, na.action=na.exclude)
I checked autocorrelation using Durbin–Watson test using
lmtest::dwtest(mod)
The output was very close to 2. Field (2012) Discovering Statistics Using R states that as long as the result is between 1.5 and 2.5 it is acceptable.
Similar Posts:
- Solved – Heteroscedasticity in linear regression, there is a a pattern. What to do
- Solved – Contradictory results about heteroscedasticity (gvlma / Breusch-Pagan test)
- Solved – Contradictory results about heteroscedasticity (gvlma / Breusch-Pagan test)
- Solved – Taking the average p value from a set of simulated p values
- Solved – Taking the average p value from a set of simulated p values