I have a question regarding the below output from a chi-squares test, which I find to be confusing and contrary to my expected results – my chi-squared value is infinity here 🙂
I have two questions here
- I made a data frame showing the relation between smoking and working out. In the column workoutideal, I have tried to convey that smokers don't work out and non smokers work out. In the column workoutmixed, it's any random data.
I expected it to show a strong relation between smoke and workoutideal (I was expecting chi square to be 0), but a weak relation between smoke and workoutmixed (I was expecting any integer value for chi square here). However, what I observe is the exact opposite. Please see my output below:
mydata = data.frame(smoke = c('no','yes','no','no','yes') workoutideal = c('yes','no','yes','yes','no') workoutmixed = c('no','no','yes','yes','yes') ) table(smoke, workoutideal) workoutideal smoke no yes no 0 3 yes 2 0 table(smoke, workoutmixed) workoutmixed smoke no yes no 1 2 yes 1 1 chisq.test(smoke,workoutideal) Pearson's Chi-squared test with Yates' continuity correction data: smoke and workoutideal X-squared = 1.7014, df = 1, p-value = 0.1921 Warning message: In chisq.test(smoke, workoutideal) : Chi-squared approximation may be incorrect chisq.test(smoke, workoutmixed) Pearson's Chi-squared test with Yates' continuity correction data: smoke and workoutmixed X-squared = 0, df = 1, p-value = 1 Warning message: In chisq.test(smoke, workoutmixed) : Chi-squared approximation may be incorrect
- While deciding whether null hypothesis should be accepted or rejected in R, should I look at the X-squared value and accept null hypothesis if it is less than the critical value for it's degrees of freedom and reject otherwise. OR, should I look the p-value and accept null hypothesis if it is higher than 0.05, the significance level and reject otherwise.
Best Answer
You are confused about the nature of hypothesis testing, test statistics, and p-values.
What you might expect from your "ideal" case is that the p-value would be 0, not that the chi-squared test statistic would be 0. Your test statistic would be very large. (The reason why it isn't very large, and your p-value isn't very low, is just that you have few data.)
On the other hand, for your "mixed" case, the opposite should be true: That is, the test statistic should be very low, and the p-value should be close to 1. Which we in fact see, low N notwithstanding.
Regarding question 2, using the critical value for the chi-squared test statistic or using whether the p-value is < alpha will always yield the same decision. This is because the critical value corresponds to the spot where the p-value drops below alpha.