Solved – Dealing with heteroscedasticity in mixed models

I collected crop yield data for many years across multiple locations, which are nested under provinces and some associated weather data. I am interested in making a model and then using new weather data to predict yield across these locations.

Here's a model I fitted

library(lme4)   Linear mixed model fit by maximum likelihood  ['lmerMod'] Formula:  log(yield) ~ x1 + I(x1^2) + x2 +  I(x2^2) + x3 + I(x3^2) +                x4 + I(x4^2) + x5 + I(x5^2) +                x6 + I(x6^2) + x7 + I(x7^2) +                x8 + I(x8^2) + x9 + x10 +                I(x10^2) + x11 + I(x11^2) + year +               (year | province/location) Data: dat   AIC      BIC   logLik deviance df.resid  -11495.4 -11261.8   5777.7 -11555.4    17735   Scaled residuals:    Min      1Q  Median      3Q     Max  -4.4276 -0.5241  0.1373  0.6644  4.7144   Random effects: Groups            Name        Variance  Std.Dev. Corr  location:province (Intercept) 0.0028622 0.05350                           year        0.0005911 0.02431  0.46  province          (Intercept) 0.0028255 0.05316                           year        0.0003357 0.01832  -0.52 Residual                      0.0296151 0.17209        Number of obs: 17765, groups:   location:province, 174; province, 14  Fixed effects:             Estimate  Std. Error t value (Intercept) 7.77016483  0.01642658 473.024 x1          0.01490593  0.00286908   5.195 I(x1^2)    -0.00034555  0.00070202  -0.492 x2         -0.00093958  0.00300568  -0.313 I(x2^2)    -0.00008501  0.00056272  -0.151 x3          0.03302874  0.00226069  14.610 I(x3^2)     0.00664297  0.00142503   4.662 x4         -0.04088568  0.00252176 -16.213 I(x4^2)    -0.01398973  0.00139918  -9.998 x5          0.01764166  0.00374300   4.713 I(x5^2)    -0.00298363  0.00135861  -2.196 x6          0.00964780  0.00405505   2.379 I(x6^2)    -0.00359899  0.00123643  -2.911 x7          0.01127387  0.00352026   3.203 I(x7^2)    -0.00175782  0.00052768  -3.331 x8         -0.05147980  0.00392094 -13.129 I(x8^2)     0.00166814  0.00077124   2.163 x9          0.00018234  0.00132609   0.138 x10         0.02124123  0.00385797   5.506 I(x10^2)    0.00174181  0.00087155   1.999 x11        -0.02645305  0.00417074  -6.343 I(x12^2)    0.00045353  0.00092663   0.489 year        0.13482956  0.00617583  21.832  plot(mod) 

Since the residuals are obs – fit, I understand that as the fitted value increases, the residuals tend to move from positive to negative which means at a higher value of observed y, the model underpredicts and vice versa.

I would like some guidance on what are the possible reasons for this systematic lines in the plot and how to best deal with them?
enter image description here

The shape of the residuals seems to suggest that you have a lower/upper bound in your yield outcome variable. I.e., are some of the yields exactly zero? In this case, you may consider fitting a two-part mixed effects model for semi-continuous data. This model is, for example, available in the GLMMadaptive package. For an example check here.

To check the fit of the model in this case, it would be better to use the simulated scaled residuals provided by the DHARMa package. For an example that uses DHARMa with models fitted by GLMMadaptive you can check here.

Similar Posts:

Rate this post

Leave a Comment