# Solved – Dimensionally weighted distance between two points in n-dimensional space

I am not sure if this question makes sense or is even ideal, but here is my problem: Many proposed distance formulas only give a distance in terms of 1-dimensions. A common distance formula is the Euclidean distance (the most obvious one). Euclidean distance \$d_{ecdn}\$ of two points located at \$(x_1, x_2, x_3, … , x_n)\$ and \$(y_1, y_2, y_3, … , y_n)\$, respectively, is given by

\$d_{ecdn} = sqrt{displaystylesum_{i=1}^{n} (x_i-y_i)^2}\$.

Geometrically, in any number of dimensions (but imagine 3 dimensions as an example), if you were to draw a line between the two points, whose length expresses the Euclidean distance, the line itself exists in 1 dimension. Why is this a problem? I am doing a project involve data mining and information retrieval. Basically, each document in a corpus is mapped to a location in an \$n\$-dimensional space where \$n\$ is the total number of terms within the whole set of documents. They say that if document A is geometrically closer to document B than document C, then document A is more similar to B than C. Euclidean distance gives the wrong impression of similarity since each dimension matters.

I am wondering if anyone has any good algorithms that will calculate the "distance" between two points, but with a weight on each dimensions. An example would be the Pearson Product-Moment Correlation Coefficient. This formula is given by

Contents

`` Mahalanobis distance is thus unitless and scale-invariant,   and takes into account the correlations of the data set. ``