Why is there a difference in p-values for the following model
$$
y = a + b_1x_1 + b_2 x_2 + b_{12}x_1x_2 + epsilon
$$
depending on the scale of the x's?
> # response variable > y <- rnorm(8, mean=17, sd=1.2) > > # model 1 > x1 <- c(1, -1, -1, -1, 1, 1, -1, 1) > x2 <- c(-1, 1, 1, -1, -1, 1, -1, 1) > fit1 <- lm(y ~ x1 + x2 + x1*x2) > > # model 2 (factors transformation) > z1 <- x1 * 7.5 + 47.5 > z2 <- x2 * 11.5 + 67.5 > fit2 <- lm(y ~ z1 + z2 + z1*z2) > > # comparison > summary(fit1)$coef Estimate Std. Error t value Pr(>|t|) (Intercept) 17.6901204 0.3394185 52.1189021 8.111573e-07 x1 0.3434611 0.3394185 1.0119103 3.688180e-01 x2 0.0928959 0.3394185 0.2736913 7.978731e-01 x1:x2 -0.1541870 0.3394185 -0.4542681 6.731943e-01 > summary(fit2)$coef Estimate Std. Error t value Pr(>|t|) (Intercept) 9.237873473 12.957889026 0.7129150 0.5152809 z1 0.166462912 0.269459429 0.6177661 0.5701650 z2 0.092992493 0.189241898 0.4913948 0.6488902 z1:z2 -0.001787676 0.003935287 -0.4542681 0.6731943
Best Answer
Let's compare the two models.
The original one is clearly and well expressed in the question,
$$y = a + b_1x_1 + b_2 x_2 + b_{12}x_1x_2 + epsilon.$$
Let's write the second model as
$$y = a^prime + b^prime_1 z_1 + b^prime_2 z_2 + b^prime_{12}z_1z_2 + delta.$$
Because the values of the numbers 7.5, 47.5, etc. are of little interest, let's just name them with Greek letters:
$$z_i = alpha_i x_i + gamma_i.$$
We know them; they are constant; they will not need to be estimated.
Plugging these equations into the second model shows how it attempts to relate $y$ to the $x_i$:
$$eqalign{ y &= &a^prime + b^prime_1 (alpha_1 x_1 + gamma_1) + b^prime_2 (alpha_2 x_2 + gamma_2) + b^prime_{12}(alpha_1 x_1 + gamma_1)(alpha_2 x_2 + gamma_2) + delta \ &= &(a^prime + b^prime_1gamma_1 + b^prime_2gamma_2 + b^prime_{12}gamma_1gamma_2) + (b^prime_1alpha_1 + b^prime_{12} gamma_2alpha_1)x_1 + (b^prime_2alpha_2 + b^prime_{12}gamma_1alpha_2)x_2 \ &&+ (b^prime_{12}alpha_1alpha_2)x_1x_2 + delta. }$$
Recalling that the default tests conducted by software compare coefficients to zeros, it's easy to compare the results, line by line, to the output for the first ($x$) model:
The errors are modeled in the same way: $epsilon = delta$. Therefore the fits (predictions) will be the same and so will the residuals. These are identical models, merely reparameterized.
The test of the intercept compares $a^prime + b^prime_1gamma_1 + b^prime_2gamma_2 + b^prime_{12}gamma_1gamma_2$ to $0$.
The tests of the coefficients compare $b^prime_1alpha_1 + b^prime_{12} gamma_2alpha_1$ and $b^prime_2alpha_2 + b^prime_{12}gamma_1alpha_2$ to $0$.
The test of the interaction term compares $b^prime_{12}alpha_1alpha_2$ to $0$.
Only in the last case is there a simple relationship to the tests in the second model (involving only the primed coefficients): because rescaling $b^prime$ by $alpha_1alpha_2$ also scales its standard error by the same amount, the $t$ statistic (which is the ratio of the estimate to its SE) does not change. Sure enough, the p-values for the interaction terms in both outputs agree because 6.731943e-01
is the same number as 0.67319431
(to within the displayed precision).
In the other cases (the intercept and coefficients of the $x_i$), the coefficients are linear functions of the coefficients for the second model. Their estimates will therefore be the very same linear functions of the estimated coefficients.
We can use this to work out the exact relationships in the two outputs provided we have information about the covariances of the estimates. This is because the standard errors of the linear combinations for the first three coefficients will depend on the variances of the estimates and their covariances. For instance,
$$text{var}(hat{b}_1) = text{var}(hat b^prime_1alpha_1 + hat b^prime_{12} gamma_2alpha_1) = alpha_1^2text{var}(hat b^prime_1) + (gamma_2alpha_1)^2text{var}(hat b^prime_{12}) + 2alpha_1^2gamma_2text{cov}(hat b^prime_1, hat b^prime_{12}). $$
(The hats over the letters denote data-based estimates, as usual.)
In theory, it matters not which model you choose. It is convenient, however, to use one where the automatic tests conducted by the software are relevant for your analytical purposes. Let that guide how you re-express the independent variables in a regression.
Similar Posts:
- Solved – Use residuals as dependent variable
- Solved – Prediction error in least squares with a linear model
- Solved – Deriving K-means algorithm as a limit of Expectation Maximization for Gaussian Mixtures
- Solved – How to calculate the variance of interaction term from an equivalent model without interaction
- Solved – Joint distribution of least square estimates $(hatalpha,hatbeta)$ in a simple linear regression model