Solved – Method for a hypothesis testing non normal distribution number of retweets

I am a stats-beginner, Using pandas I am analysing a small dataset. There are 60 data-points, 22 of which are from Group A and 38 are from Group B. The dataset is made up of the number of retweets gained by a single tweet. The Null Hypothesis is that a tweet in Group A is not more likely (<=) than one in Group B to be retweeted.

Because most tweets are not retweeted the majority of data-points are zero. This leads to a distribution that looks like this (using seaborn):

enter image description here

As this is a far from normal distribution, it wouldn't be appropriate to use a t-test, nor do I have any expectations regarding how many retweets each tweet should get, so I cant use Chi-Squared.

Please would you give me some hints as to what would be an intelligent, beginner-friendly (and statistically robust way) to conduct a hypothesis test on this data?

You could use Mann-Withney U-test

In statistics, the Mann–Whitney U test (also called the Mann–Whitney–Wilcoxon (MWW), Wilcoxon rank-sum test (WRS), or Wilcoxon–Mann–Whitney test) is a nonparametric test of the null hypothesis that two samples come from the same population against an alternative hypothesis, especially that a particular population tends to have larger values than the other.

It can be applied on unknown distributions contrary to t-test which has to be applied only on normal distributions, and it is nearly as efficient as the t-test on normal distributions.

In python, this test is available in scipy.stats mannwithneyu. Similarly to a t-test, you get a value of the U statistic and a probability.

Hope it helps.

Similar Posts:

Rate this post

Leave a Comment