The company whose data I am working with is not willing to or ready to do A/B testing to evaluate my system's accuracy. They are asking me to come up with some other way to get some preliminary results, and if they turn up positive they are willing to do the A/B test. So, how do I do it?
Let me give you an example from my real data. There are 21 genres of movies, and a certain user watches 1 movie of genre 13, and 17 movies of genre 14, shown below.
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 17, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
My system tells to recommend genre 11 next. So how do I know how good is my recommendation accuracy without first recommending the user some movies of genre 11 and checking whether and how many of them he/she opens?
An alternative to what user3494047 suggests, is to hide a few data points at random for every user, make recommendations using your algorithm, and then uncover the hidden data and see how many of those matched the recommendations.
Since you have the actual count of the movies of a particular genre, watched by each user, you can use it as a proxy for the strength of their preference for that genre.
- Solved – Finding similar items based on user likes and dislikes
- Solved – Training and testing an autoencoder on very sparsely populated data
- Solved – Model based approaches to content based recommenders. How does this work
- Solved – Out of sample predictions for logistic regression models in R
- Solved – What statistical methods are there to recommend a movie like on Netflix