I've developed a model for predicting the probability that each horse will win a race.
The output of the model is the predicted probability that each horse will win, the sum of all the probabilities will be 1.
If I were to look at bookmaker prices for a horse race they would look something like this:
3.9, 5.4, 3.95, 6.7, 9, 14
If we took the sum of the probabilities that each horse would win, we'd get something like this:
1/3.9 + 1/5.4 + 1/3.95 + 1/6.7 + 1/9 + 1/14 = 1.027
The odds come out to be over than 1 due to the bookmaker overround.
This is problematic for me because I need to compare the odds offered at the bookmaker and the probability from my model to determine if there is any value placing a bet.
In order to compare them accurately I will need to either
- Remove the overround from the bookmaker prices (deflate the probabilities)
- Inflate the output of my model to include the bookmaker's overround
Can anyone suggest how to do this accurately from a theoretical point of view?
I believe that doing this linearly – applying the same factor to each horse – is incorrect.
I believe that each horse should take a percentage of the over round based on their odds of winning, the horses that are more likely to win should take a bigger proportion of the overround. Does this sound more correct?
Thanks for your help.
You are correct to reason that the bookmaker has probably not applied the overround equally to all runners. It's more likely to be a function of that runner's contribution to the book and the amount of money they expect to take on it.
If you have a sufficient amount of data on bookmaker prices and race winners, and assume a specific relationship between a runner's odds and its share of the overround then you can find the parameters of that relationship which maximize the likelihood – in other words, the adjustment to the raw bookmaker probabilities that result in the most accurate predictions of the winners.