I'm confused about how scikit-learn's roc_auc_score is working.

As I understand it, an ROC AUC score for a classifier is obtained as follows:

- Sample from the parameter space
- Fit the model
- Make predictions with the model to get $Y_{predicted}$
- Calculate $P(F_P)$ and $P(T_P)$ via $Y_{true}$ and $Y_{predicted}$

The above steps are performed repeatedly until you have enough $(P(F_P), P(T_P))$ points to get a good estimate of the area under the curve.

The sklearn.metrics.roc_auc_score method takes $Y_{true}$ and $Y_{predicted}$ and gives the area under the curve based only on these. How is this possible? It seems you'd need multiple sets of $Y_{true}$ and $Y_{predicted}$ coming from different forms of the model (i.e. trained with different parameters) in order to get multiple $(P(F_P), P(T_P))$ points to estimate the area under the curve.

I'm clearly not understanding something here. What is it?

**Contents**hide

#### Best Answer

The ROC is simply the measure of the accuracy of any ONE given model but for different classification threshold values.

For example, a logistic regression's output will lie between 0 and 1. Generally we would classify a record as 1, if it's assigned probability is greater than 0.5.

In the ROC however, imagine each point on the line represents the performance of the same binary classifier but with varying thresholds from 0 to 1.

If I set my threshold to 0, then essentially all predicted values are 1. This will make False Positive Rate 1, and True Positive Rate 1. In the other extreme, setting a threshold of 1, will give you a TPR and FPR of 0.

Finding the point along the ROC with the highest TPR for the lowest FPR would give you the optimal threshold.

So the multiple sets of Y(predicted) that you refer to is in fact the application of multiple threshold values and computing TPR and FPR for the exact same model.