Solved – How to determine the type of probability distribution for a dataset

I have aggregated(total) youtube videos views. I have take log of that views. And calculated autoregressive koefs that can be used for the video views predictibility tests. Let say I have aggregated daily views array for each video. Koef for each video is calculated as:

koef = aggregatedViews[60] / aggregatedViews[30]

This list of koefs for all videos forms target distribution.

Initially, I thought that it will be half-normal but looks like this is not normal. Is this Pareto Type I distribution?

Here is 70 bins histogram:70 bin histogram

How to determine the type of probability distribution for a dataset?

You can use the fitdistrplus package in R. First, you can plot a Cullen AC and Frey graph using the descdist function in order to find possible candidates of distributions . Then you can fit the best candidates of distributions to your data using fitdist. Now you can test the hypothesis that your data comes from these distributions by performing a Kolmogorov-Smirnov test or an Anderson-Darling test. Finally, you can select a fitted distribution using graphical methods or comparing measures of quality like AIC values.

Here you can find a nice example of this procedure:

Similar Posts:

Rate this post

Leave a Comment