I have a statistical question in R and I was hoping to use Chebyshev inequality theorem, but I don't know how to implement it.
Example: Imagine a dataset with a nonnormal distribution, I need to be able to use Chebyshev's inequality theorem to assign NA values to any data point that falls within a certain lower bound of that distribution. For example, say the lower 5% of that distribution. This distribution is one-tailed with an absolute zero.
I am unfamiliar with how to go about this, as well as with what sort of example might help.
Best Answer
Suppose you collect call center wait times from a call center for 24 hours in a single day. Each hour, the mean and standard deviation of the call wait times will vary. Assume your call center is huge and you have 10,000 customer service representatives at any given hour.
Here are 24 data frames with 10,000 wait times per data frame. Each data frame was collected from a different hour on the same day.
set.seed(123) mu <- rpois(24, 5) # true mean call center wait times mu list.df <- lapply(mu, function(x) rpois(10000, x)) # 10k call center workers per hour str(list.df)
Given 24 data frames, you can calculate the sample mean and sample standard deviation for each hour.
mean.each.hour <- sapply(list.df, mean) sd.each.hour <- sapply(list.df, sd) mean.each.hour sd.each.hour
You could also calculate the 95th percentile of each hour
p95.each.hour <- sapply(list.df, function(x) quantile(x, probs = 0.95)) p95.each.hour
The question is: Are the observed 95th percentiles consistent with the theoretical relationship between the mean, standard deviation, percentiles? Assuming the sample mean and sample standard deviation are exactly equal to the population mean and population standard deviation.
We can use Chebyshev Inequality for this.
Chebyshev.k <- function(rt_prob=0.05){k = sqrt(1 / rt_prob)} k <- Chebyshev.k(0.05) k Chebyshev.max <- function(means, stds, rt_probs){ k <- Chebyshev.k(rt_probs) theoretical.max <- means + k*stds return(theoretical.max) } p95.theoretical <- Chebyshev.max(mean.each.hour, sd.each.hour, rt_probs=0.05) p95.theoretical >= p95.each.hour
In our 24 data frames, each observed 95th percentile is less than or equal to the theoretical maximum given by Chebyshev Inequality.
Similar Posts:
- Solved – How to compute Chebyshev bounds on probabilities: one- or two-sided inequality
- Solved – One sided Chebyshev inequality for higher moment
- Solved – 1sigma error on the mean
- Solved – Calculate Standard Deviation of hour
- Solved – When providing plus or minus of a mean value, do I use the standard deviation or the variance