Solved – What does the term “gold label” refer to in the context of semi-supervised classification

Throughout the Snorkel tutorial here and in the team's related white paper there's references to "gold labels", but the term evades definition.

What are 'gold labels' in the semi-supervised classification context?

Thank you.


We call this type of training data weak supervision because it’s noisier and less accurate than the expensive, manually-curated “gold” labels that machine learning models are usually trained on. However, Snorkel automatically de-noises this noisy training data, so that we can then use it to train state-of-the-art models.

As I understand it, the goal of Snorkel is to generate a large set of synthetic training data for large-scale ML algorithms by learning from a much smaller set of hand-labeled training data. The hand-labeled training data have been handled by subject-matter experts and thus we are much more certain of the correctness of the label (but obtaining a large set of such data may be prohibitively expensive, hence the impetus for Snorkel in the first place). So it appears they are calling these hand-labeled data "gold" labels, as they represent some reliable ground-truth value. This can be contrasted with the labels output by the algorithm, which are hopefully of high quality but are still subject to noise by construction.

Similar Posts:

Rate this post

Leave a Comment