My question originally arises from reading this post
Use of circular predictors in linear regression.
Right now, I'm trying construct linear regression using
"Bike Sharing dataset" from
https://archive.ics.uci.edu/ml/datasets/bike+sharing+dataset
which basically tries to regression bike rental count on different variables
One of the predictor that I have question is on using "Hour" of when the rental occurred, which takes value from 0 to 23.
The original post suggests transforming the circular data (time of day) using sine function to maintain the circular characteristic.
I was trying to apply to same methodology to my situation to transform the Hour variable. However,transforming 0~23 using sin(π hour/180) lets 00:00 and 12:00 to have 0. But I think people will certainly display different behavior when renting bike at midnight(00:00) and afternoon(12:00)
In this case, is it better to just use 23 dummy variables to account for hour
or am I misunderstanding the concept of circular regression?
Best Answer
Circular regression most often would refer to regression with a circular outcome.
In this case, we have linear regression with a circular predictor. In that case, we would add both the sine and the cosine of the angle to the regression, so that we predict the outcome as $hat{y} = beta_1cos(pi * text{hour} / 12) + beta_2sin(pi * text{hour} / 12).$ Adding both the sine and cosine naturally resolves the issue you mention. Note that here, different than you, I've assumed that you represent hour
in hours rather than degrees.
For a more elaborate answer on how to do this and what it means, please see the answer to this SO question.
Similar Posts:
- Solved – Predict magnitude from angle in linear regression
- Solved – Why getting very high values for MSE/MAE/MAPE when R2 score is very good
- Solved – How to encode timestamp features toward better meaningful features
- Solved – Regression and Correlation of Wind Direction (circular) Data
- Solved – Nearest Neighbor Algorithm for Circular dimensions