I collected crop yield data for many years across multiple locations, which are nested under provinces and some associated weather data. I am interested in making a model and then using new weather data to predict yield across these locations.

Here's a model I fitted

`library(lme4) Linear mixed model fit by maximum likelihood ['lmerMod'] Formula: log(yield) ~ x1 + I(x1^2) + x2 + I(x2^2) + x3 + I(x3^2) + x4 + I(x4^2) + x5 + I(x5^2) + x6 + I(x6^2) + x7 + I(x7^2) + x8 + I(x8^2) + x9 + x10 + I(x10^2) + x11 + I(x11^2) + year + (year | province/location) Data: dat AIC BIC logLik deviance df.resid -11495.4 -11261.8 5777.7 -11555.4 17735 Scaled residuals: Min 1Q Median 3Q Max -4.4276 -0.5241 0.1373 0.6644 4.7144 Random effects: Groups Name Variance Std.Dev. Corr location:province (Intercept) 0.0028622 0.05350 year 0.0005911 0.02431 0.46 province (Intercept) 0.0028255 0.05316 year 0.0003357 0.01832 -0.52 Residual 0.0296151 0.17209 Number of obs: 17765, groups: location:province, 174; province, 14 Fixed effects: Estimate Std. Error t value (Intercept) 7.77016483 0.01642658 473.024 x1 0.01490593 0.00286908 5.195 I(x1^2) -0.00034555 0.00070202 -0.492 x2 -0.00093958 0.00300568 -0.313 I(x2^2) -0.00008501 0.00056272 -0.151 x3 0.03302874 0.00226069 14.610 I(x3^2) 0.00664297 0.00142503 4.662 x4 -0.04088568 0.00252176 -16.213 I(x4^2) -0.01398973 0.00139918 -9.998 x5 0.01764166 0.00374300 4.713 I(x5^2) -0.00298363 0.00135861 -2.196 x6 0.00964780 0.00405505 2.379 I(x6^2) -0.00359899 0.00123643 -2.911 x7 0.01127387 0.00352026 3.203 I(x7^2) -0.00175782 0.00052768 -3.331 x8 -0.05147980 0.00392094 -13.129 I(x8^2) 0.00166814 0.00077124 2.163 x9 0.00018234 0.00132609 0.138 x10 0.02124123 0.00385797 5.506 I(x10^2) 0.00174181 0.00087155 1.999 x11 -0.02645305 0.00417074 -6.343 I(x12^2) 0.00045353 0.00092663 0.489 year 0.13482956 0.00617583 21.832 plot(mod) `

Since the residuals are obs – fit, I understand that as the fitted value increases, the residuals tend to move from positive to negative which means at a higher value of observed `y`

, the model underpredicts and vice versa.

I would like some guidance on what are the possible reasons for this systematic lines in the plot and how to best deal with them?

**Contents**hide

#### Best Answer

The shape of the residuals seems to suggest that you have a lower/upper bound in your `yield`

outcome variable. I.e., are some of the yields exactly zero? In this case, you may consider fitting a two-part mixed effects model for semi-continuous data. This model is, for example, available in the **GLMMadaptive** package. For an example check here.

To check the fit of the model in this case, it would be better to use the simulated scaled residuals provided by the **DHARMa** package. For an example that uses **DHARMa** with models fitted by **GLMMadaptive** you can check here.

### Similar Posts:

- Solved – Scaling predictors in mixed models
- Solved – Interpreting interactions in a linear model vs quadratic model
- Solved – Regression coefficient has negative symbol but positive from the raw plot
- Solved – How exactly should repeated measures situation be treated in machine learning models
- Solved – fit a mixed model with subjects that only have 1 observation