I am trying to test for the effect of a treatment on a response variable. The response variable decays over time in what I believe is an exponential way. The measurement doesn't go below zero, so there are loads of zeros in the data.
The measurements looks like this
p <- ggplot(dat, aes(time, value, color = treatment)) + geom_point()
I've attempted fitting lm()
:
dat.lm <- lm(log(value + 1) ~ time + treatment, data = dat)
and the diagnostics are very ordinary, and predictions are poor.
newdat <- expand.grid(treatment = factor(1:2), time = 0:6) newdat$pred <- predict(dat.lm, newdat) p2 <- p + geom_line(data = newdat, aes(x = time, y = exp(pred) - 1))
I would like to be able to both test for the significance of the effect of the treatment, as well as estimate parameters for the decay function. I have also had a go with nls()
, and a cubic model at least fits better at time == 0, but still not great, and doesn't really make sense.
dat.cubic <- predict(lm(value~time+I(time^2)+I(time^3) + treatment, data = dat), newdat) newdat$cubic <- dat.cubic p2 + geom_line(data = newdat, aes(x = time, y = cubic), linetype = "dashed")
To repeat, my main question is how to test the effect of treatment with this data, and secondly, what is the best way to fit the model?
My data:
structure(list(time = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6), value = c(382.49, 377.13, 422.72, 377.52, 410.48, 435.78, 399.74, 540.5, 481.29, 455.53, 439.56, 368.63, 421.92, 490.87, 384.1, 478.92, 327.94, 403.7, 410.99, 332.97, 396.78, 420.85, 359.82, 474.43, 371.25, 130.92, 84.87, 199.36, 150.84, 112.1, 111.78, 183.84, 144.01, 163.16, 92.86, 237.2, 172.71, 161.1, 92.02, 204.17, 183.41, 140, 100.93, 62.52, 152.94, 116.63, 220.2, 145.3, 168.9, 155.89, 68.3, 44.44, 57.63, 0, 69.72, 48.96, 64.12, 44.45, 22.9, 40.06, 25.48, 12.94, 30.43, 53.14, 53.16, 31.64, 18.39, 55.73, 35.15, 44.96, 55.55, 0, 1.36, 30.73, 55.1, 15.71, 22.66, 0, 26.53, 8.44, 51.93, 0, 0, 0, 0, 10.42, 16.42, 0, 28.17, 0, 13.21, 48.45, 10.64, 40.52, 10.34, 15.97, 0, 29.81, 21.88, 36.6, 0, 0, 7.13, 2.84, 12.94, 6.09, 0, 0, 0.54, 11.6, 15.58, 5.39, 0, 7.67, 0, 0, 5.3, 0, 19.28, 4.12, 0, 9.96, 0, 10.8, 3.48, 1.98, 1.74, 2.49, 1.79, 0.43, 3.62, 2.37, 0.87, 2.02, 0.27, 1.63, 1.25, 3.43, 2.73, 1.76, 2.22, 1.6, 1.27, 2.31, 2.3, 0.64, 1.8, 2.17, 2.52, 0.5, 0.44, 1.02, 1.19, 0.93, 0, 1.48, 0, 1.31, 0, 0.78, 0.13, 0.9, 0, 1.79, 0.8, 0.21, 0.74, 1.81, 0, 0.98, 0, 1.59, 0.58, 0.89, 1.48, 379.16, 393.21, 412.4, 426.87, 375.65, 438.64, 399.23, 454.42, 450.48, 473.15, 473.4, 396.43, 430.82, 464.84, 370.04, 462.76, 422.85, 435.26, 428.96, 396.91, 460.68, 485.32, 404.56, 464.46, 475.78, 198.16, 130.74, 38.98, 114.62, 54.07, 89.64, 74.2, 60.19, 64.38, 70.59, 35.57, 17.4, 105.75, 67.31, 102.33, 123.3, 131.07, 94.44, 70.1, 62.25, 122.39, 22.49, 120.74, 63.28, 61.21, 0, 23.05, 32.91, 0, 49.65, 44.3, 3.58, 20.8, 31.15, 0, 29.53, 36.56, 55.63, 0, 57.8, 4.9, 0, 28.29, 17.23, 64.23, 4.94, 0, 31.43, 56.98, 6.46, 0, 0, 1.44, 0, 0, 0.23, 0, 5.83, 0, 0, 7.02, 3.23, 3.52, 2.65, 11.88, 0, 2.63, 0, 0, 0, 0, 11.29, 5.1, 4.66, 13.05, 3.18, 1.52, 0, 0, 5.07, 2.15, 0, 2.7, 0, 6.75, 0, 0, 0, 0, 2.49, 0, 0, 0, 0, 0.02, 3.27, 11.94, 3.89, 3.22, 0, 0.76, 0, 0.68, 1.05, 0, 1.04, 0, 0, 2.04, 0.86, 0.03, 0, 0.56, 0, 0.03, 0.5, 0, 0, 0.95, 0, 0, 1.2, 0, 0.23, 0, 0, 0, 0.65, 0.67, 0, 0.87, 0.95, 0, 0.89, 0.91, 0, 0, 1.74, 1.09, 0, 1.04, 0.21, 0.36, 0, 0, 1.8, 0.04, 0, 0, 0.71), treatment = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("1", "2"), class = "factor")), .Names = c("time", "value", "treatment" ), row.names = c(NA, 350L), class = "data.frame")
Best Answer
Provided there is some justification for an exponential decay model, you could try the gnls
function from package nlme
. This allows you to compare treatments and model variance heterogeneity. Here is something to get you started:
library(ggplot2) p <- ggplot(DF, aes(time, value, color = treatment)) + geom_point()
Get starting values by fitting separate nls
models:
coef(nls(value ~ C * exp(-k*time), data = DF[DF$treatment == 1,], start = list(C=400, k=1))) # C k #415.729905 1.080539 coef(nls(value ~ C * exp(-k*time), data = DF[DF$treatment == 2,], start = list(C=400, k=1))) # C k #430.787442 1.606167
Now use gnls
:
library(nlme) fit <- gnls(value ~ C * exp(-k*time), data = DF, params = list(C ~ treatment, k ~ treatment), start = list(C = c(415, 15), k = c(1.1, 0.5)), weights = varExp(-0.8, form = ~ time), control = gnlsControl(nlsTol = 0.1))
I use an exponential variance structure (without dependence on treatment) here, but you could try some alternatives. Note that I had to strongly increase nlsTol
to achieve a successful fit. Use the result with caution (but it looks pretty good):
summary(fit) #Generalized nonlinear least squares fit # Model: value ~ C * exp(-k * time) # Data: DF # AIC BIC logLik # 2485.028 2508.176 -1236.514 # #Variance function: # Structure: Exponential of variance covariate # Formula: ~time # Parameter estimates: # expon #-0.8035687 # #Coefficients: # Value Std.Error t-value p-value #C.(Intercept) 413.5077 15.976589 25.88210 0.0000 #C.treatment2 9.7849 24.062902 0.40664 0.6845 #k.(Intercept) 1.0932 0.021132 51.73149 0.0000 #k.treatment2 0.4104 0.061657 6.65565 0.0000 # # Correlation: # C.(In) C.trt2 k.(In) #C.treatment2 -0.664 #k.(Intercept) 0.629 -0.418 #k.treatment2 -0.216 0.456 -0.343 # #Standardized residuals: # Min Q1 Med Q3 Max #-2.49675931 -0.55858325 -0.02101141 0.53929573 4.36616094 # #Residual standard error: 92.79678 #Degrees of freedom: 350 total; 346 residual plot(fit)
Now plot the result:
newdata <- expand.grid(time = seq(0, 6, length.out = 100), treatment = factor(1:2)) newdata$value <- predict(fit, newdata = newdata) p + geom_line(data = newdata)
In a next step one could try removing the dependence of C on treatment from the model. That might help to achieve better convergence …
Similar Posts:
- Solved – Why is the nonlinear least squares confidence band so wide
- Solved – get linear model when I tried to fit exponential model
- Solved – How to plot an exponential curve in R
- Solved – What are the decision values at the margins in an SVM model
- Solved – Problems with plotting exponential curve and data in the same plot when values are high