Solved – Do I need to create dumthe variables to fit a seasonal model

Per what I understood from the 2nd chapter of "Time series analysis" by Shumway. When doing a seasonal model in R, you may want to use a dummy variable to tell lm() that a given month value's should be used or not.

In the below example, in the r variable, I have a dataset with Year, Month and TotN_conc. (Monthly data for 14 years).

To fit a seasonal model I have done:

M2=as.numeric(ifelse(r$Month==2, 1, 0)) M3=as.numeric(ifelse(r$Month==3, 1, 0)) M4=as.numeric(ifelse(r$Month==4, 1, 0)) M5=as.numeric(ifelse(r$Month==5, 1, 0)) M6=as.numeric(ifelse(r$Month==6, 1, 0)) M7=as.numeric(ifelse(r$Month==7, 1, 0)) M8=as.numeric(ifelse(r$Month==8, 1, 0)) M9=as.numeric(ifelse(r$Month==9, 1, 0)) M10=as.numeric(ifelse(r$Month==10, 1, 0)) M11=as.numeric(ifelse(r$Month==11, 1, 0)) M12=as.numeric(ifelse(r$Month==12, 1, 0))  lm(TotN_conc~M2+M3+M4+M5+M6+M7+M8+M9+M10+M11+M12+seq(1,168,1),     data = r) 

But I am getting the exact same result if I do:

lm(TotN_conc~Month+seq(1,168,1), data=r)  

Should I just use the second approach? Why I don't need the dummy variables?

(screenshot getting same result below)
enter image description here

No, that would be very clumsy. R is doing this automatically when you introduce the variable as factor:

lm(TotN_conc~ as.factor(Month), data=r)  

But maybe you are getting exactly the same result because Month already is a factor – you can check with class(r$Month) – in that case, your specification like lm(TotN_conc~ Month, data=r) is enough.

You can also check with model.matrix() that R automatically constructed exactly what you wanted to construct with your dummy variables:

model <- lm(TotN_conc~ as.factor(Month), data=r)  model.matrix(model) 

Similar Posts:

Rate this post

Leave a Comment