Solved – When adding jitter a scatterplot for conveying information is appropriate

"Jittering" is adding a bit of random noise to scatterplots, to better see the information contained in the data, usually when there is a lot of overplotting. This overplotting can result from e.g. very high sample sizes, or when one of the variables (say on x-axis) is discrete. An example of that may be body height (on y-axis) and sex (male, female) on x-axis. Several values of heights may be very similar or identical among individuals of any given sex, which may prevent the reader from seeing the true sample size, or just overall make it harder to see any potential patterns. Here, here and here are some examples of what jittering is and how it may be used.

My two questions are:

1) In practice, and in your own respective analyses or publications, at which point do you decide to jitter and what would typically guide your decision?

2) Would you only use jittering for your own visual inspection of data, or would you also use it in scientific papers or other publications, to better convey the information and get the message across? Would this be appropriate or acceptable thing to do? (In the latter, I am of course working under the assumption that you would clearly state that in the plot caption, to inform the reader that jitter was added and why.)

First, jittering is only one solution to the problem of overplotting. I cover a bunch of ideas in a paper I presented at a SAS conference (but you can apply the ideas with R or other software). In particular, you can make the dots smaller, or you can make them transparent or you can use a density plot.

Second, I use jittering when there is a moderate degree of overplotting, because that is the situation when jittering helps.

Third, I have published papers that used jittering in scatterplots and other plots. And, as that implies, jittering is sometimes used for other plots.

Similar Posts:

Rate this post

Leave a Comment