I have 50 products. For each product, I identify 3 related products (1 top related, 2 (partially related, least related)) using similarity measures. I want to compare the ranked list generated by my model (predicted) with the ranked list specified by the domain experts(ground truth).
For example,
Product 1
- [3,2,1] –> ranked assigned by user (from most related to least related)
- [3,1,2] –> ranked predicted by the model.
Through reading, I found that I may use rank correlation based approaches such as Kendall Tau/Spearmen to compare the ranked lists. However, I am not sure if these approaches are suitable as my number of samples is low (4). Please correct me if i am wrong.
Another approach is to use Jaccard similarity (set intersection) to quantify the similarity between two ranked list. Then, I may plot histogram from the setbased_list (see below).
for index, row in evaluate.iterrows(): d= row['Id'] y_pred = [3,2,1,0] y_true = [row['A'],row['B'],row['C'],row['D']] sim = jaccard_similarity_score(y_true, y_pred) setbased_list.append(sim)
- Is my approach to the problem above correct?
- What are other approaches that I may use if I want to take into consideration the positions of elements in the list (weight-based)?
Best Answer
Jaccard similarity will always return 1, as you will always have the same elements in both lists. Order does not matter as you are comparing the ratio of the intersection over the union of the sets.
Kendall's tau computes the pairwise difference between two sets. This is does not capture the "closeness" of the rankings.
[1,2,3,4] vs [2,1,3,4] is as wrong as [1,2,3,4] vs [4,2,3,1]. Clearly the first difference is closer to the truth than the latter.
Spearman's footrule is able to capture this relationship, see this explanation here for more details
Similar Posts:
- Solved – Similarity between sets with different size
- Solved – Finding similar groups in data
- Solved – Is Jaccard similarity/distance suitable for non-binary, quantitative data
- Solved – Jaccard similarity coefficient vs. Point-wise mutual information coefficient
- Solved – Jaccard similarity coefficient vs. Point-wise mutual information coefficient