Solved – Regression using circular variable (hour from 0~23) as predictor

My question originally arises from reading this post
Use of circular predictors in linear regression.

Right now, I'm trying construct linear regression using
"Bike Sharing dataset" from
https://archive.ics.uci.edu/ml/datasets/bike+sharing+dataset
which basically tries to regression bike rental count on different variables

One of the predictor that I have question is on using "Hour" of when the rental occurred, which takes value from 0 to 23.
The original post suggests transforming the circular data (time of day) using sine function to maintain the circular characteristic.

I was trying to apply to same methodology to my situation to transform the Hour variable. However,transforming 0~23 using sin(π hour/180) lets 00:00 and 12:00 to have 0. But I think people will certainly display different behavior when renting bike at midnight(00:00) and afternoon(12:00)

In this case, is it better to just use 23 dummy variables to account for hour
or am I misunderstanding the concept of circular regression?

Circular regression most often would refer to regression with a circular outcome.

In this case, we have linear regression with a circular predictor. In that case, we would add both the sine and the cosine of the angle to the regression, so that we predict the outcome as $hat{y} = beta_1cos(pi * text{hour} / 12) + beta_2sin(pi * text{hour} / 12).$ Adding both the sine and cosine naturally resolves the issue you mention. Note that here, different than you, I've assumed that you represent hour in hours rather than degrees.

For a more elaborate answer on how to do this and what it means, please see the answer to this SO question.

Similar Posts:

Rate this post

Leave a Comment