# Solved – Linear Regression: How to satisfy all assumptions and rectify auto-correlation at the same time

I am attempting to model sightings of a species at either the east or west of an island to see what environmental variables such as SST, Chl-a etc may be influencing an obvious migration pattern that they display between east and west. Species presence is recorded as a percentage calculated from the monthly total sightings at either the east or west location. It has to be done this way as the data was collected via citizen science so the number of animals isn't reliable.

Below is the `lm()` model for which I had to transform my data to ensure I met all assumptions.

``transformed<-abs(dat\$y - mean(dat\$y))  mod1 <- lm(transformed~x, data=dat)  gvlma::gvlma(mod1)  mod1 ASSESSMENT OF THE LINEAR MODEL ASSUMPTIONS USING THE GLOBAL TEST ON 4 DEGREES-OF-FREEDOM: Level of Significance =  0.05   Call:  gvlma::gvlma(x = mod1)                         Value p-value                Decision Global Stat        2.241814  0.6914 Assumptions acceptable. Skewness           0.009048  0.9242 Assumptions acceptable. Kurtosis           0.253950  0.6143 Assumptions acceptable. Link Function      1.120267  0.2899 Assumptions acceptable. Heteroscedasticity 0.858549  0.3541 Assumptions acceptable. ``

I found that mod1 had an autocorrelation issue so I added a lag1 using the slide() function in the DataCombine package.

``mod2<-lm(x~y,        data=dat)  library(DataCombine) data <- data.frame(mp2017.dat, resid_mod1=mod2\$residuals) data_1 <- slide(econ_data, Var="resid_mod1", NewVar = "lag1", slideBy = -1) data_2 <- na.omit(data_1)   transformed<-abs(data_2\$x - mean(data_2\$x))  mod3 <- lm(transformed ~ y + lag1, data=data_2) ``

This rectified the autocorrelation issue but now the model does not satisfy all assumptions!

``gvlma::gvlma(mod3)  ASSESSMENT OF THE LINEAR MODEL ASSUMPTIONS USING THE GLOBAL TEST ON 4 DEGREES-OF-FREEDOM: Level of Significance =  0.05   Call:  gvlma::gvlma(x = mod3)                         Value   p-value                   Decision Global Stat        22.23782 1.797e-04 Assumptions NOT satisfied! Skewness            2.58773 1.077e-01    Assumptions acceptable. Kurtosis            0.05036 8.224e-01    Assumptions acceptable. Link Function      19.04845 1.274e-05 Assumptions NOT satisfied! Heteroscedasticity  0.55128 4.578e-01    Assumptions acceptable. ``

I have been going round in circles for days with this problem! Please, could someone tell me if I can use either model or, if not, how can I rectify one problem without causing another?

Contents

I have now managed to overcome this issue using the following `lm()` which satisfies all assumptions.

``    mod <- lm(abs(y-mean(y)) ~ x,            data=mp2017.dat, na.action=na.exclude) ``

I checked autocorrelation using Durbin–Watson test using

``    lmtest::dwtest(mod) ``

The output was very close to 2. Field (2012) Discovering Statistics Using R states that as long as the result is between 1.5 and 2.5 it is acceptable.

Rate this post