I have a training set with $N$ instances ${I_1,…I_N}$,

where each pair of instances is associated with a similarity score $S(I_x,I_y)in [0,1]$ indicates if the two instances are similar or not.

I have developed $M$ similarity functions ${S_1,…,S_M}$, each of which is based on a different feature vector I extract from the two instances at the pair

$S_m(f_m(I_x), f_m(I_y))in [0,1]$.

Note that these similarity functions are probably correlated in some way.

Given these functions and the my training set, I want to learn a unified similarity prediction function $P$ such that $P=argmin_P |P(I_x,I_j)-S(I_x,I_j)|^2$.

What is the best way to achieve such a $P$?

**Contents**hide

#### Best Answer

Welcome to the field of metric learning. If you use this as a google search query, you will get lots of material on your problem. Here is a quick idea on how you can do it.

One way is to find coefficients $alpha_m$ for each of your similarity functions, and combine them into a global similarity: $S(I_x, I_y) = frac{1}{M} sum_m alpha_m S_m(I_x, I_y)$. Given the squared error, this is a linear least squares problem.

One key issue with metric learning is that it the targets scale quadratically with the number of samples. This might be a hindrance for some least squares procedures, and you might have to resort to a stochastic gradient based optimization technique.