How to interpret the labels with the outlier numbers when you plot the following in R (QQplot)

`set.seed(1) y <- rnorm(100) x <- rnorm(100) plot(lm(y ~ x), which=2) # which = 2 gives the plot `

It gives a number 61 on the top. What is it?

I figured it might be the index of the outlier couple. It appears to be connected to a score of around `y = 3`

and `x = 3`

. But when:

`cbind(y,x)[61,] > y x 2.4016178 0.4251004 `

How to read these numbers in R's QQplot?

**Contents**hide

#### Best Answer

The number in the plot corresponds to the **indices of the standardized residuals and the original data.** By default, `R`

labels the **three most extreme residuals,** even if they don't deviate much from the QQ-line. So the fact that the points are labelled doesn't mean that the fit is bad or anything. This behaviour can be changed by specifying the option `id.n`

. Let me illustrate this with your example

`set.seed(1) y <- rnorm(100) x <- rnorm(100) lm.mod <- lm(y ~ x) # linear regression model plot(lm.mod, which=2) # QQ-Plot lm.resid <- residuals(lm(y ~ x)) # save the residuals sort(abs(lm.resid), decreasing=TRUE) # sort the absolute values of the residals 14 61 24 2.32415869 2.29316200 2.09837122 `

The first three most extreme residuals are number 14, 61 and 24. These are the numbers in the plot. These indices correspond to the indices of the original data. So the data points 14, 24 and 26 are the ones that cause the most extreme residuals. We can also mark them in a scatterplot (the blue points). Note that because you generated your `y`

and `x`

independently, the regression line is simply the mean of `y`

without any slope:

`# The original data points corresponding to the 3 most extreme residuals cbind(x,y)[c(14, 24, 61), ] x y [1,] -0.6506964 -2.214700 [2,] -0.1795565 -1.989352 [3,] 0.4251004 2.401618 # Make a scatterplot of the original data and mark the three points # and add the residuals par(bg="white", cex=1.6) plot(y~x, pch=16, las=1) abline(lm.mod, lwd=2) # add regression line pre <- predict(lm.mod) # Add the residual lines segments(x[c(14, 24, 61)], y[c(14, 24, 61)], x[c(14, 24, 61)], pre[c(14, 24, 61)], col="red", lwd=2) # Add the points points(x[c(14, 24, 61)], y[c(14, 24, 61)], pch=16, cex=1.1, col="steelblue", las=1) `

### Similar Posts:

- Solved – How to identify a particular residual from a mixed-effects model in R
- Solved – How to identify a particular residual from a mixed-effects model in R
- Solved – How to find outliers in a data series
- Solved – What could cause big differences in correlation coefficient between Pearson’s and Spearman’s correlation for a given dataset
- Solved – Detecting abnormal trends in timeseries data