# Solved – Standardizing feature vectors for regression

Suppose I have a data set with the following structure:

Each row of the data set indexes a town. The first column/feature variable is the total population while the other feature variables include the count of people who own various items (one feature variable for cars, one for home appliances, etc.), while still others measure average income etc.

Now, it is often necessary to 'transform' the feature vectors before running some sort of regression algorithm on the data, for example standardizing them.

Suppose the towns have very disparate populations (call this feature \$X_1\$ and let town \$i\$ have value \$X_1^i\$). Consider the feature vector, say \$X_2\$ measuring the number of some X in each town. My question is:

Should one, in general, first transform \$X_2\$ in proportion to the total population of the towns, that is \$X_2^i mapsto frac{X_2^i}{X_1^i}\$ and then standardize the column by \$X_2^i mapsto frac{X_2^i-bar{X_2}}{hat{sigma}_{X_2}^2}\$ or, even simply scaling the values to the interval \$[max(X_2), min(X_2)]\$?

The reason I am asking the question is: I can imagine a case where despite the towns having very different population counts, there is an item which have roughly the same count in each town. In which case, if we were to simply standardize the columns, it will reduce the values in the feature column to zero (or nearby) and intuitively, there will be tremendous loss of information.

Assume that I know that \$X_1\$ is collinear with \$X_2\$ and I won't be using that feature.

Contents