I know the main concepts of data/text mining but I used them mainly in binary classification problems (just two classes). I am now dealing with a problem with 8 classes and struggling to calculate an evaluation metric, like precision, recall etc.
Can I convert a multi-class confusion matrix to a binary confusion matrix with TP, FP, TN FN values and then calculate the aforementioned metrics? If not, is there an alternative evaluation metric for a classifier when i got a confusison matrix like this:
Best Answer
Welcome to the website, this is a variation of a commonly asked question. You can definitely convert a multi class matrix to a binary conf matrix.
Below is some R code on how you can collapse a confusion matrix to a binary one. It also calculates Cohen's kappa to get the overall 'rater' agreement between the classifeir and the actual class (of cmg
).
cmg <- matrix(c(1639, 116, 49, 35, 138, 0, 0, 236, 150, 274, 27, 21, 28, 0, 0, 73, 22, 24, 58, 9, 94, 0, 0, 30, 33, 27, 31, 21, 146, 0, 0, 49, 14, 9, 5, 1, 49, 0, 0, 22, 1, 0, 1, 1, 7, 0, 0, 6, 11, 0, 0, 1, 14, 0, 0, 21, 201, 11, 8, 5, 49, 0, 0, 253), ncol=8,dimnames = rep(list(("T1","T2","T3","T4","T5", "T6","T7", "T8")),2)) require(psych) # Overall agreement overall_agg <- sum(diag(cmg))/sum(cmg) # Overall Cohen's Kappa for cmg unweighted_kappa <- cohen.kappa( cmg, n.obs=sum(cmg) ) # initialise containers spec_agr_guideline <- list() collapsed_mat_guideline <- list() unweighted_kappa_psych <- list() # loop through all treatments for (i in seq(1,nrow(cmg)) ) { # Specific agreements spec_agr_guideline [i] <- 2*cmg[i,i] / (sum(cmg[i,]) + sum (cmg[,i]) ) # Collapsed positive agreement confusion matrices per treatment collapsed_mat_guideline[[i]] <- matrix(c(cmg[i,i], sum(cmg[i,])-cmg[i,i], sum(cmg[,i])-cmg[i,i], sum(cmg)-sum(cmg[i,])-sum(cmg[,i])+cmg[i,i]), ncol=2) # Calculate unweighted Cohen's Kappa per collapsed (binary) confusion amtrix unweighted_kappa_psych[[i]] <- cohen.kappa( collapsed_mat_guideline[[i]], n.obs=sum(collapsed_mat_guideline[[i]]) )
Furthermore, you can do some other cool stuff to assess the performance of a multi-class classifier. Some relevant answers from CrossValidated.com are: link1, link2, link3.