# Solved – R: How to interpret the QQplot’s outlier numbers

How to interpret the labels with the outlier numbers when you plot the following in R (QQplot)

``set.seed(1) y <- rnorm(100) x <- rnorm(100) plot(lm(y ~ x), which=2)   # which = 2 gives the plot ``

It gives a number 61 on the top. What is it?

I figured it might be the index of the outlier couple. It appears to be connected to a score of around `y = 3` and `x = 3`. But when:

``cbind(y,x)[61,]  >  y         x  2.4016178 0.4251004  ``

How to read these numbers in R's QQplot?

Contents

The number in the plot corresponds to the indices of the standardized residuals and the original data. By default, `R` labels the three most extreme residuals, even if they don't deviate much from the QQ-line. So the fact that the points are labelled doesn't mean that the fit is bad or anything. This behaviour can be changed by specifying the option `id.n`. Let me illustrate this with your example
``set.seed(1) y <- rnorm(100) x <- rnorm(100) lm.mod <- lm(y ~ x) # linear regression model plot(lm.mod, which=2) # QQ-Plot lm.resid <- residuals(lm(y ~ x)) # save the residuals sort(abs(lm.resid), decreasing=TRUE) # sort the absolute values of the residals         14         61         24 2.32415869 2.29316200 2.09837122 ``
The first three most extreme residuals are number 14, 61 and 24. These are the numbers in the plot. These indices correspond to the indices of the original data. So the data points 14, 24 and 26 are the ones that cause the most extreme residuals. We can also mark them in a scatterplot (the blue points). Note that because you generated your `y` and `x` independently, the regression line is simply the mean of `y` without any slope:
``# The original data points corresponding to the 3 most extreme residuals  cbind(x,y)[c(14, 24, 61), ]              x         y [1,] -0.6506964 -2.214700 [2,] -0.1795565 -1.989352 [3,]  0.4251004  2.401618  # Make a scatterplot of the original data and mark the three points # and add the residuals  par(bg="white", cex=1.6) plot(y~x, pch=16, las=1) abline(lm.mod, lwd=2) # add regression line pre <- predict(lm.mod)  # Add the residual lines segments(x[c(14, 24, 61)], y[c(14, 24, 61)], x[c(14, 24, 61)],           pre[c(14, 24, 61)], col="red", lwd=2)  # Add the points points(x[c(14, 24, 61)], y[c(14, 24, 61)], pch=16, cex=1.1, col="steelblue", las=1) ``