Solved – `scale` in R with no normal distribution

The function called scale, in R, does the same of subtracting the mean and dividing by the sd each element.

So the scale function allows to take in count differente parameter with different scale.

# Manually scaling
(x – mean(x)) / sd(x)

# Default scaling
scale(x)

But, does it make sense scale a variable if it doesn't have a normal distribution?

Andrea

Scaling a variable is a linear transformation and it will not change the distribution of the variable so it does not matter if the variable has a non-normal distribution.

You can confirm this by generating non-normally distributed data in R, such as: X=rnorm(10000,10,5)^2. Then, scale the variable "X" X.z = scale(x)

Comparing the two histograms: hist(X) vs. hist(X.z) you'll see the distributions are unchanged.

EDIT: As noted in the comments, scaling does influence the interpretation of the parameters when doing many statistical analyses (regression, PCA etc) so the decision to scale should be based on how you want to interpret your parameters.

However, scaling will not change the underlying distribution of the variable nor will it influence (positively or negatively) the violations of model assumptions. For example, an assumption of linear regression is normality of the residuals scaling a raw variable will not affect this normality.

Similar Posts:

Rate this post

Leave a Comment