Solved – Why does the centered variable not have zero mean

It is well established that centering a variable, i.e. subtracting the mean of that variable from every value produces a variable with zero mean. For example:

> data = c(1,2,3,4,5,6,7,8,9) > mean = mean(data) > data.centered = data-mean > mean(data.centered) [1] 0 

So far, so good. However, an attempt at centering a logged variable from my dataset produces a mean that is close to zero, but not exactly zero:

[1] -0.0000000000000004258896 

This is puzzling. I have two questions:

1) why is the mean not exactly zero?

2) is the fact that the mean is not exactly zero a problem for calculating regression interactions?

This is a result of a numerical error. Computers have limited precision and such errors are normal. It is easy to understand if you realize how mean is computed:

sum = 0 for i = 0..N     sum += data[i] mean = sum / N 

Assuming we are talking about floating point numbers, which are stored in the memory as an exponent and a mantissa. As you sum numbers, the variable sum becomes large and its exponent grows. It can happen that the numbers data[i] you are adding are simply too small to change the mantissa anymore. This is a common source of the numerical errors.

Another thing is that many numbers cannot be expressed exactly in the computers (for example 0.1).

For better explanation on how numerical errors work see this answer

Practically, such a small error as the number you posted should not cause problems for you, unless your data are of the same small magnitude (as pointed out in the comment of @jbowman).

Similar Posts:

Rate this post

Leave a Comment