Solved – Chebyshev Inequality in R

I have a statistical question in R and I was hoping to use Chebyshev inequality theorem, but I don't know how to implement it.

Example: Imagine a dataset with a nonnormal distribution, I need to be able to use Chebyshev's inequality theorem to assign NA values to any data point that falls within a certain lower bound of that distribution. For example, say the lower 5% of that distribution. This distribution is one-tailed with an absolute zero.

I am unfamiliar with how to go about this, as well as with what sort of example might help.

Suppose you collect call center wait times from a call center for 24 hours in a single day. Each hour, the mean and standard deviation of the call wait times will vary. Assume your call center is huge and you have 10,000 customer service representatives at any given hour.

Here are 24 data frames with 10,000 wait times per data frame. Each data frame was collected from a different hour on the same day.

set.seed(123)  mu <- rpois(24, 5)  # true mean call center wait times  mu  list.df <- lapply(mu, function(x) rpois(10000, x))  # 10k call center workers per hour  str(list.df) 

Given 24 data frames, you can calculate the sample mean and sample standard deviation for each hour.

mean.each.hour <- sapply(list.df, mean) sd.each.hour <- sapply(list.df, sd)  mean.each.hour sd.each.hour 

You could also calculate the 95th percentile of each hour

p95.each.hour <- sapply(list.df, function(x) quantile(x, probs = 0.95)) p95.each.hour 

The question is: Are the observed 95th percentiles consistent with the theoretical relationship between the mean, standard deviation, percentiles? Assuming the sample mean and sample standard deviation are exactly equal to the population mean and population standard deviation.

We can use Chebyshev Inequality for this.

Chebyshev.k <- function(rt_prob=0.05){k = sqrt(1 / rt_prob)}  k <- Chebyshev.k(0.05) k  Chebyshev.max <- function(means, stds, rt_probs){   k <- Chebyshev.k(rt_probs)    theoretical.max <- means + k*stds    return(theoretical.max) }  p95.theoretical <- Chebyshev.max(mean.each.hour, sd.each.hour, rt_probs=0.05)  p95.theoretical >= p95.each.hour 

In our 24 data frames, each observed 95th percentile is less than or equal to the theoretical maximum given by Chebyshev Inequality.

Similar Posts:

Rate this post

Leave a Comment