Currently I am creating a box plot. I am new on the field of statistics and especially box plots. Find the picture following:
On the y-axis find the number of messages. I have problems understanding what I see there. The plot is created by Matlab automatically. As I know there should be four quartile in a box plot. I see there only three. Probably this happened because of the value of the median (it is the green line). But I do not know what this means if a quartile is missing. Is somebody around here who can may be explain this and tell me some details, what you can read out of the plot?
The median is probably identical to the first quartile, which is why they overlap. This tends to happen when you have a large proportion of identical, low values in the dataset. Here's an example that reproduces this pattern:
dat <- c(1,2,2,2,3,5,6) median(dat) ## 2 quantile(dat, 0.25) ## 25% ## 2 boxplot(dat)
You can read a basic introduction about how to interpret boxplots here. Though as Nick Cox points out below, its discussion of what are called 'outliers' is flawed and should be ignored. Outliers should not be deleted unless there is very strong reason to, such as a clear data recording error.
Note also that a boxplot is not a great way to display many datasets. I agree with Stephan Kolassa's recommendation of a beeswarm plot for small datasets and a violin plot/kernel density plot for larger ones.