I have a multi-label problem which I'm tackling with a NN. To get the multi-label scores, I use a tanh on the last layers (as suggested in the literature), and then selecting the ones corresponding to a classified label according to a threshold (which, again, is often suggested to be put at 0.5). For example (pseudocode of what's happening in the network):

`threshold = 0.5 last_hidden = [1,0.995,0.39,-0.283,-1.033] multi_label_scores = tanh(last_hidden) # [ 0.7615942 , 0.7594864 , 0.3713602 , -0.27567932, -0.77510875] labels = [1 for s in multi_label_scores if s >= threshold] # [1, 1, 0, 0, 0] `

My question is: apart from putting the threshold to 0.5, or maybe finding a better value during the parameter tuning, is there a way to learn such threshold, for example using specific max-margin loss functions (or similar)?

—- EDIT —-

The suggested threshold would be actually 0 for a tanh, not 0.5 (which would be used for a sigmoid). Anyway, it's just a translation. The problem is still the same.

**Contents**hide

#### Best Answer

Threshold for your output neuron is also a *hyper-parameter* and can be tuned just like others. The $0.5$ suggestion is probably for sigmoid function, because it is symmetric around 0 and hits $0.5$ at $0$. Similarly for tanh (check its symmetry), the so-called suggested is probably $0$, not $0.5$. But this is like saying your suggested neural network size is 2 layers etc. You should also tune your threshold. Several statistics such as ROC curve, Precision/Recall curves are obtained the measure the performance while changing this threshold, and they're used to understand the behavior of the system. By the way, a more commonly suggested option for *sigmoid*, for instance, is to use your class priors.

### Similar Posts:

- Solved – Normalizing continuous features using sigmoid function
- Solved – How to manually balance unbalanced multi-class/multi-label data
- Solved – How does Binary Relevance work on multi-class multi-label problems
- Solved – Neural Network: Matlab uses different activation functions for different layers – why
- Solved – How to plot visualization for multi-label k-Nearest Neighbor