Solved – What does the term “gold label” refer to in the context of semi-supervised classification

Throughout the Snorkel tutorial here https://github.com/HazyResearch/snorkel and in the team's related white paper there's references to "gold labels", but the term evades definition.

What are 'gold labels' in the semi-supervised classification context?

Thank you.

From https://hazyresearch.github.io/snorkel/blog/snark.html:

We call this type of training data weak supervision because it’s noisier and less accurate than the expensive, manually-curated “gold” labels that machine learning models are usually trained on. However, Snorkel automatically de-noises this noisy training data, so that we can then use it to train state-of-the-art models.

As I understand it, the goal of Snorkel is to generate a large set of synthetic training data for large-scale ML algorithms by learning from a much smaller set of hand-labeled training data. The hand-labeled training data have been handled by subject-matter experts and thus we are much more certain of the correctness of the label (but obtaining a large set of such data may be prohibitively expensive, hence the impetus for Snorkel in the first place). So it appears they are calling these hand-labeled data "gold" labels, as they represent some reliable ground-truth value. This can be contrasted with the labels output by the algorithm, which are hopefully of high quality but are still subject to noise by construction.

Similar Posts:

Rate this post

Leave a Comment