Throughout the Snorkel tutorial here https://github.com/HazyResearch/snorkel and in the team's related white paper there's references to "gold labels", but the term evades definition.
What are 'gold labels' in the semi-supervised classification context?
Thank you.
Best Answer
From https://hazyresearch.github.io/snorkel/blog/snark.html:
We call this type of training data weak supervision because it’s noisier and less accurate than the expensive, manually-curated “gold” labels that machine learning models are usually trained on. However, Snorkel automatically de-noises this noisy training data, so that we can then use it to train state-of-the-art models.
As I understand it, the goal of Snorkel is to generate a large set of synthetic training data for large-scale ML algorithms by learning from a much smaller set of hand-labeled training data. The hand-labeled training data have been handled by subject-matter experts and thus we are much more certain of the correctness of the label (but obtaining a large set of such data may be prohibitively expensive, hence the impetus for Snorkel in the first place). So it appears they are calling these hand-labeled data "gold" labels, as they represent some reliable ground-truth value. This can be contrasted with the labels output by the algorithm, which are hopefully of high quality but are still subject to noise by construction.
Similar Posts:
- Solved – What does the term “gold label” refer to in the context of semi-supervised classification
- Solved – What does the term “gold label” refer to in the context of semi-supervised classification
- Solved – Distant supervision: supervised, semi-supervised, or both
- Solved – Can we say that RNN for time series is an example of semi-supervised learning
- Solved – How to you apply sentiment analysis on topic modelling topics