I am doing a regression analysis in R, in which I examine the contribution of each car attribute to its price.
Some variables can be coded as a dummy variable, or as a continuous variable.
For example, I can add a dummy variable for each number of cylinder (2, 4, 6 or 8), or I can consider this as a continuous variable.
Is there a difference between the two possibilities?
Best Answer
Regressing price $y$ on a constant and the number of cylinders $x$ would make sense if the price was known to be affine in the number of cylinders: the price increase from 2 to 4 cylinders is the same as the price increase from 4 to 6 cylinders and is the same as the price increase from 6 to 8. Then you could run the regression:
$$ y_i = a + b x_i + epsilon_i $$
On the other hand, it may not be affine in reality. If price isn't affine in number of cylinders, the above model would be misspecified.
What could one do? Let $z_2$ be a dummy variable for two cylinders, let $z_4$ be a dummy variable for 4 cylinders, etc… Since there are only four possibilities (2,4,6, or 8 cylinders), you likely have enough data to run the more complete regression:
$$ y_i = a + b_4 z_{4,i} + b_6 z_{6,i} + b_8 z_{8,i} + epsilon_i$$
Here the coefficients would $b_4$, $b_6$ etc… would be the price increase relative to a 2 cylinder car. (the constant $a$ would pick up the mean price of a two-cylinder car.)
Or if you run the regression without a constant, you could run:
$$ y_i = b_2 z_{2,i} + b_4 z_{4,i} + b_6 z_{6,i} + b_8 z_{8,i} + epsilon_i$$
Here the coefficients ($b_2$, $b_4$, $b_6$, $b_8$) would be the mean price of each cylinder type. Observe how the average price no longer is assumed to be affine in the number of cylinders! You could have a small difference between $b_4$ and $b_6$ but a large difference between $b_6$ and $b_8$.
Similar Posts:
- Solved – How to visualize (make plot) of regression output against categorical input variable
- Solved – How to visualize (make plot) of regression output against categorical input variable
- Solved – Low Correlation Coefficient and low Mean Square Error
- Solved – Interpreting coefficients in log-log model with dumthe for elasticity
- Solved – Log-Log Regression – Dumthe Variable