I've developed a model for predicting the probability that each horse will win a race.

The output of the model is the predicted probability that each horse will win, the sum of all the probabilities will be 1.

If I were to look at bookmaker prices for a horse race they would look something like this:

`3.9, 5.4, 3.95, 6.7, 9, 14`

If we took the sum of the probabilities that each horse would win, we'd get something like this:

`1/3.9 + 1/5.4 + 1/3.95 + 1/6.7 + 1/9 + 1/14 = 1.027`

The odds come out to be over than 1 due to the bookmaker overround.

This is problematic for me because I need to compare the odds offered at the bookmaker and the probability from my model to determine if there is any value placing a bet.

In order to compare them accurately I will need to either

- Remove the overround from the bookmaker prices (deflate the probabilities)
- Inflate the output of my model to include the bookmaker's overround

Can anyone suggest how to do this accurately from a theoretical point of view?

I believe that doing this linearly – applying the same factor to each horse – is incorrect.

I believe that each horse should take a percentage of the over round based on their odds of winning, the horses that are more likely to win should take a bigger proportion of the overround. Does this sound more correct?

Thanks for your help.

**Contents**hide

#### Best Answer

You are correct to reason that the bookmaker has probably not applied the overround equally to all runners. It's more likely to be a function of that runner's contribution to the book and the amount of money they expect to take on it.

If you have a sufficient amount of data on bookmaker prices and race winners, and assume a specific relationship between a runner's odds and its share of the overround then you can find the parameters of that relationship which maximize the likelihood – in other words, the adjustment to the raw bookmaker probabilities that result in the most accurate predictions of the winners.