I created a simple linear model in R
m1 <- lm(CD~DBH1, data=a) summary(m1)
To increase the model fit I transformed the data to log scale and created a linear model with the transformed data:
m5 <- lm(LnCD~LnDBH, data=a) summary(m5)
Is it simply possible to compare all results in the summary as $R^2$ and residual standard error or do I need to transform the log scale back? So that means: is the standard error in the log model always lower due to the log scale?
Best Answer
The residual standard error (perhaps better described as standard deviation of residuals) on log scale can easily be higher than on the original scale! Consider values considerably below 1 on whatever measurement scale is being used. Then variability as measured on a logarithmic scale will usually be higher than on the original scale. To see this, which is independent of regression and a simple consequence of how logarithms are defined, consider the variability, measured as standard deviation, of 10, 100, 1000; and then their reciprocals; and then the logarithms of both sets. So the common reduction in the numbers you see is just a consequence of commonly using values above 1, and nothing else.
The more general point is that variability on log scale (specifically here residual standard errors from a regression) is just that: variability on a logarithmic scale. It is possible to back-transform results such as standard errors back to the original scale, but the results don't have an interpretation that can directly be related to the original regression, nor is that especially useful.
$R^2$ is principle is unit-free and dimensionless, so many people compare $R^2$ before and after logarithmic transformation to judge its success. That at best gives an informal guide, but it is not a formal or rigorous test of anything, nor does it always answer the main question of whether each individual regression is a good idea (for example, $R^2$ is easily inflated by an outlier).
Much better guides to whether logarithmic transformation is successful are given by
Scientific or subject-matter understanding that logarithmic transformation is natural, notably that underlying mechanisms or processes are better considered as multiplicative, not additive. Under this heading can be included consistency with known limiting behaviour. In particular, logging both response and predictor is equivalent to fitting a power law or power function, which goes through the origin if both coefficients are positive. That function is often reasonable on scientific or other substantive grounds.
Graphical and other evidence that the transformation worked, as judged by better (meaning, simpler) patterns on scatter plots of data and residual plots. In this case, there is precisely one predictor, so the plot of logged response versus logged predictor showing data and fitted function is by far the simplest good guide to success. Residual versus fitted plots can help too. Even scatter and symmetry of residual variation are also desirable, but not absolutely essential.
As already hinted, it is possible to identify situations in which the original scale regression gives high $R^2$ but spuriously, say as a side-effect of an outlier; yet a regression on log scales with lower $R^2$ may arguably be the better model.
Here as elsewhere excessive attention to individual figures of merit in model summaries can just get in the way. Look at the data; look at residual and fitted values; and use your scientific knowledge.
Notes:
A wild guess is that these are data on trees and that CD is something like crown diameter and DBH is diameter at breast height! True or not, there is almost certainly subject-matter literature you should be examining too. There is a substantial literature on tree geometry that is evident even to non-ecologists such as myself.
Use of R is immaterial here. In fact, it's best to pose questions independently of the syntax of your software.
Similar Posts:
- Solved – Linear regression (adding constant to variables)
- Solved – Converting standard deviation for data on a logarithmic scale base 10 to stdev with base e
- Solved – SciPy’s stats boxcox transformation unexpected behavior: negative exponent (lambda)
- Solved – Could standardizing an independent variable cause the t-statistic of the OLS estimate to change
- Solved – the meaning of the residual standard error in linear ordinary least squares output