I am a stats-beginner, Using pandas I am analysing a small dataset. There are 60 data-points, 22 of which are from Group A and 38 are from Group B. The dataset is made up of the number of retweets gained by a single tweet. The Null Hypothesis is that a tweet in Group A is not more likely (<=) than one in Group B to be retweeted.
Because most tweets are not retweeted the majority of data-points are zero. This leads to a distribution that looks like this (using seaborn):
As this is a far from normal distribution, it wouldn't be appropriate to use a t-test, nor do I have any expectations regarding how many retweets each tweet should get, so I cant use Chi-Squared.
Please would you give me some hints as to what would be an intelligent, beginner-friendly (and statistically robust way) to conduct a hypothesis test on this data?
Best Answer
You could use Mann-Withney U-test
In statistics, the Mann–Whitney U test (also called the Mann–Whitney–Wilcoxon (MWW), Wilcoxon rank-sum test (WRS), or Wilcoxon–Mann–Whitney test) is a nonparametric test of the null hypothesis that two samples come from the same population against an alternative hypothesis, especially that a particular population tends to have larger values than the other.
It can be applied on unknown distributions contrary to t-test which has to be applied only on normal distributions, and it is nearly as efficient as the t-test on normal distributions.
In python, this test is available in scipy.stats
mannwithneyu. Similarly to a t-test
, you get a value of the U statistic and a probability.
Hope it helps.
Similar Posts:
- Solved – Method for a hypothesis testing non normal distribution number of retweets
- Solved – Method for a hypothesis testing non normal distribution number of retweets
- Solved – The distribution of the AUC
- Solved – Mann-Whitney U test or Kruskal Wallis test for comparing median of two groups
- Solved – Best type of graph to represent data tested with the Mann-Whitney U-test