Per what I understood from the 2nd chapter of "Time series analysis" by Shumway. When doing a seasonal model in R, you may want to use a dummy variable to tell lm() that a given month value's should be used or not.
In the below example, in the r variable, I have a dataset with Year, Month and TotN_conc. (Monthly data for 14 years).
To fit a seasonal model I have done:
M2=as.numeric(ifelse(r$Month==2, 1, 0)) M3=as.numeric(ifelse(r$Month==3, 1, 0)) M4=as.numeric(ifelse(r$Month==4, 1, 0)) M5=as.numeric(ifelse(r$Month==5, 1, 0)) M6=as.numeric(ifelse(r$Month==6, 1, 0)) M7=as.numeric(ifelse(r$Month==7, 1, 0)) M8=as.numeric(ifelse(r$Month==8, 1, 0)) M9=as.numeric(ifelse(r$Month==9, 1, 0)) M10=as.numeric(ifelse(r$Month==10, 1, 0)) M11=as.numeric(ifelse(r$Month==11, 1, 0)) M12=as.numeric(ifelse(r$Month==12, 1, 0)) lm(TotN_conc~M2+M3+M4+M5+M6+M7+M8+M9+M10+M11+M12+seq(1,168,1), data = r)
But I am getting the exact same result if I do:
lm(TotN_conc~Month+seq(1,168,1), data=r)
Should I just use the second approach? Why I don't need the dummy variables?
(screenshot getting same result below)
Best Answer
No, that would be very clumsy. R is doing this automatically when you introduce the variable as factor:
lm(TotN_conc~ as.factor(Month), data=r)
But maybe you are getting exactly the same result because Month
already is a factor – you can check with class(r$Month)
– in that case, your specification like lm(TotN_conc~ Month, data=r)
is enough.
You can also check with model.matrix()
that R automatically constructed exactly what you wanted to construct with your dummy variables:
model <- lm(TotN_conc~ as.factor(Month), data=r) model.matrix(model)