I collected crop yield data for many years across multiple locations, which are nested under provinces and some associated weather data. I am interested in making a model and then using new weather data to predict yield across these locations.
Here's a model I fitted
library(lme4) Linear mixed model fit by maximum likelihood ['lmerMod'] Formula: log(yield) ~ x1 + I(x1^2) + x2 + I(x2^2) + x3 + I(x3^2) + x4 + I(x4^2) + x5 + I(x5^2) + x6 + I(x6^2) + x7 + I(x7^2) + x8 + I(x8^2) + x9 + x10 + I(x10^2) + x11 + I(x11^2) + year + (year | province/location) Data: dat AIC BIC logLik deviance df.resid -11495.4 -11261.8 5777.7 -11555.4 17735 Scaled residuals: Min 1Q Median 3Q Max -4.4276 -0.5241 0.1373 0.6644 4.7144 Random effects: Groups Name Variance Std.Dev. Corr location:province (Intercept) 0.0028622 0.05350 year 0.0005911 0.02431 0.46 province (Intercept) 0.0028255 0.05316 year 0.0003357 0.01832 -0.52 Residual 0.0296151 0.17209 Number of obs: 17765, groups: location:province, 174; province, 14 Fixed effects: Estimate Std. Error t value (Intercept) 7.77016483 0.01642658 473.024 x1 0.01490593 0.00286908 5.195 I(x1^2) -0.00034555 0.00070202 -0.492 x2 -0.00093958 0.00300568 -0.313 I(x2^2) -0.00008501 0.00056272 -0.151 x3 0.03302874 0.00226069 14.610 I(x3^2) 0.00664297 0.00142503 4.662 x4 -0.04088568 0.00252176 -16.213 I(x4^2) -0.01398973 0.00139918 -9.998 x5 0.01764166 0.00374300 4.713 I(x5^2) -0.00298363 0.00135861 -2.196 x6 0.00964780 0.00405505 2.379 I(x6^2) -0.00359899 0.00123643 -2.911 x7 0.01127387 0.00352026 3.203 I(x7^2) -0.00175782 0.00052768 -3.331 x8 -0.05147980 0.00392094 -13.129 I(x8^2) 0.00166814 0.00077124 2.163 x9 0.00018234 0.00132609 0.138 x10 0.02124123 0.00385797 5.506 I(x10^2) 0.00174181 0.00087155 1.999 x11 -0.02645305 0.00417074 -6.343 I(x12^2) 0.00045353 0.00092663 0.489 year 0.13482956 0.00617583 21.832 plot(mod)
Since the residuals are obs – fit, I understand that as the fitted value increases, the residuals tend to move from positive to negative which means at a higher value of observed y
, the model underpredicts and vice versa.
I would like some guidance on what are the possible reasons for this systematic lines in the plot and how to best deal with them?
Best Answer
The shape of the residuals seems to suggest that you have a lower/upper bound in your yield
outcome variable. I.e., are some of the yields exactly zero? In this case, you may consider fitting a two-part mixed effects model for semi-continuous data. This model is, for example, available in the GLMMadaptive package. For an example check here.
To check the fit of the model in this case, it would be better to use the simulated scaled residuals provided by the DHARMa package. For an example that uses DHARMa with models fitted by GLMMadaptive you can check here.
Similar Posts:
- Solved – Scaling predictors in mixed models
- Solved – Interpreting interactions in a linear model vs quadratic model
- Solved – Regression coefficient has negative symbol but positive from the raw plot
- Solved – How exactly should repeated measures situation be treated in machine learning models
- Solved – fit a mixed model with subjects that only have 1 observation