I was trying to understand the final section of the paper Revisiting Baselines for Visual Question Answering. The authors state that their model performs better with a binary loss in comparison to a softmax loss.
What is a binary loss (in this case)? Is the softmax loss a synonym for binary cross-entropy? Should I use a binary loss or a softmax loss for classification?
There is a nice explanation here
Binary Cross-Entropy Loss is also called Sigmoid Cross-Entropy loss. It is a Sigmoid activation plus a Cross-Entropy loss. Unlike Softmax loss it is independent for each vector component (class), meaning that the loss computed for every vector component is not affected by other component values.
The term binary stands for number of classes = 2.
- Solved – Why is softmax considered counter-intuitive for multi-label classification
- Solved – the history of the “cross entropy” as a loss function for neural networks
- Solved – Division by zero with cross entropy cost function
- Solved – what is the best activation function for binary classification
- Solved – MSE and different types of activation functions in NN