I was trying to understand the final section of the paper Revisiting Baselines for Visual Question Answering. The authors state that their model performs better with a **binary** loss in comparison to a **softmax** loss.

What is a **binary** loss (in this case)? Is the **softmax loss** a synonym for **binary cross-entropy**? Should I use a binary loss or a softmax loss for classification?

**Contents**hide

#### Best Answer

There is a nice explanation here

Binary Cross-Entropy Lossis also called Sigmoid Cross-Entropy loss. It is a Sigmoid activation plus a Cross-Entropy loss. Unlike Softmax loss it is independent for each vector component (class), meaning that the loss computed for every vector component is not affected by other component values.

The term *binary* stands for number of classes = 2.

### Similar Posts:

- Solved – Why is softmax considered counter-intuitive for multi-label classification
- Solved – the history of the “cross entropy” as a loss function for neural networks
- Solved – Division by zero with cross entropy cost function
- Solved – what is the best activation function for binary classification
- Solved – MSE and different types of activation functions in NN