Solved – the difference between “gold standard” and “ground truth”

What's the difference between "gold standard" and "ground truth"?

The two wiki articles (i.e., gold standard, and ground truth) relate both concepts to each other in terms of model precision / accuracy. That's one possibility. But I also found that these concepts are used interchangeably when talking about labeled data sets:

In some cases it can be impossible to get the actual label (also known as the ground truth or gold standard). (source pdf)

The more complete quote is

In some cases it can be impossible to get the actual label (also known as the ground truth or gold standard) and it is estimated from the subjective opinion of a small number of experts who can often disagree on the labels [14, 29].

That strikes me as inane on several levels. If, for example, we use multiple subjective opinions to form a consensus, we would call that adjudicated opinion a reference standard, and it would not be without statistical properties that can be investigated. For example, look at The dirty coins and the three judges. Thus, although we cannot ascertain an absolute truth, we can explore how good our standard is and seek to improve it until it is good enough to be used as a measurement of whatever we wish to analyze. In the alternative case, we have only negative results. No matter what the context is, our responsibility is to state what the limits of accuracy and precision are for our measurements.

"Gold standard" is a term that is common for medical and allied fields. There are many papers submitted that use the term that are never published for good reason. Most frequently, this is because of circular reasoning of the "make an assumption then prove that that assumption was made" type, with the territory covered by that circle consisting of fanciful results that cannot be duplicated without making the same plethora of ridiculous errors.

It is better to use other terms to mitigate the opportunity for self-delusion. Unfortunately, the AMA preferred term referred to in the Wikipedia entry as Criterion Standard is not a synonym for gold standard, but rather refers to disease occurrence reporting. That is only rarely the circumstance under which authors have the bad habit of glibly using gilded comparisons.

A better term in most contexts would be reference standard which is much more to the point. For example, if we refer to a "standard kilogram" we are not saying that that standard is correct in any sense, just that we have used it as a "yardstick" because that is what we had available. It is also better in the sense that just because we use something as a reference standard does not mean that there is not a better standard that could be created whereas the words "gold standard," are frequently followed in journal articles by the word "true", or "truth" often used with an order of magnitude increased frequency to the so-called "gold-standard" assumption.

Case in point, the reference standard platinum iridium (i.e., better than gold) kilogram has slowly been losing mass (50 micrograms total) since it was first cast in the 19th century. As of May 20, 2019 that standard will be replaced by a kilogram mass standard defined in terms of Planck's constant, that is not likely to change as much as the current reference kilogram.

Like criterion standard, ground truth is frequently jargonesque. It sometimes refers to remote sensing analytic results compared to outcomes obtained from pictures at least figuratively taken while someone is standing on the ground, i.e., by collation of more direct observations. Once again, the more generic term is reference standard, which I suggest has a much firmer scientific basis in the form of established practice and rules for its precision, accuracy, criticism, evaluation, improvement and deployment.

Similar Posts:

Rate this post

Leave a Comment