I'm performing a weighted least squares regression on survey data.

The survey data is from the EU and each observation has a weight. (.4 for an one respondent, 1.5 for another.)

This weight is described as:

"The European Weight, variable 6, produces a representative sample of

the European Community as a whole when used in analysis. This variable

adjusts the size of each national sample according to each nation's

contribution to the population of the European Community."

I would like to normalize my data. For a non-weighted dataset I would do this:

`df_norm = (df - df.mean()) / (df.max() - df.min()) `

However I'm not sure the impact that would have on my weights?

Should I put the weights in another dataframe, normalize the data and then add the weights back in? Is it safe to normalize the dataframe with the weights attached?

Thanks for any wisdom you have to share.

**Contents**hide

#### Best Answer

The weights are equivalent if they only differ by a scalar factor. For example, multiplying all the weights by 2 keeps the relative importance of the subjects intact.

With this in mind:

it is ok to multiply or divide the weights by a scalar, e.g

`weights - (weights.max() - weights.min())`

. Another example is more common and normalizes the weights to sum to 1:`weights / weights.sum()`

it is not OK to add or subtract a scalar, e.g

`weights - weights.mean()`

In your case it is thus advisable to seperate the information (the responses) from the meta-information (the weights).

### Similar Posts:

- Solved – R : using survey package to run t-test on sub population of weighted data set
- Solved – R : using survey package to run t-test on sub population of weighted data set
- Solved – The weight updating in adaboost
- Solved – Weights in IPSW (inverse propensity score weighting) too high
- Solved – Post stratification weights in survey package in R