Solved – Normalizing weighted regression data

I'm performing a weighted least squares regression on survey data.

The survey data is from the EU and each observation has a weight. (.4 for an one respondent, 1.5 for another.)

This weight is described as:

"The European Weight, variable 6, produces a representative sample of
the European Community as a whole when used in analysis. This variable
adjusts the size of each national sample according to each nation's
contribution to the population of the European Community."

I would like to normalize my data. For a non-weighted dataset I would do this:

df_norm = (df - df.mean()) / (df.max() - df.min()) 

However I'm not sure the impact that would have on my weights?
Should I put the weights in another dataframe, normalize the data and then add the weights back in? Is it safe to normalize the dataframe with the weights attached?

Thanks for any wisdom you have to share.

The weights are equivalent if they only differ by a scalar factor. For example, multiplying all the weights by 2 keeps the relative importance of the subjects intact.

With this in mind:

  • it is ok to multiply or divide the weights by a scalar, e.g weights - (weights.max() - weights.min()). Another example is more common and normalizes the weights to sum to 1: weights / weights.sum()

  • it is not OK to add or subtract a scalar, e.g weights - weights.mean()

In your case it is thus advisable to seperate the information (the responses) from the meta-information (the weights).

Similar Posts:

Rate this post

Leave a Comment