Solved – Probability Plot (P-P plot): to what corresponds the straight line

I want to show that my data cannot fit a given distribution (in my case it's the Pareto type II distribution). I did a P-P plot, i.e on the y-axis I plot my sorted experimental data and on the x-axis the inverse of the theoretical cdf.
I added my Matlab code below in case my explanation is not clear.

Most of the time, on the P-P plots, I see a straight line and it's said that if the point follow this line you can assume that your data are distributed according to this distribution. But how do you plot this straight line? Is it just y = x? If so, why? If not, what exactly do we have to plot?

%%% Definition of the probability density function, the cumulative density function  %%% and the inverse density function pdf = @(v,x) x(2)/x(1) * (1 + v./x(1)).^(-x(2) - 1); cdf = @(v,x) 1- (1+v./x(1)).^-x(2); invcdf = @(y,x) x(1).*((1-y).^-(1./x(2))-1);  %%% Function to minimize to find the parameters %%% fobserved = @(x) -mean(log((pdf(islets_vol,x)))); % islets_vol = my experimental data options = optimset('Algorithm', 'interior-point',...                    'MaxIter', 1000, ...                    'MaxFunEvals', 1000);  [xhat_obs,~] = fmincon(fobserved,[0.5;0.5],-eye(2),[0;0],[],[],[],[],[],options);                                    %%% P-P plot %%% yvals = sort(islets_vol,'ascend'); % islets_vol = my experimental data xvals = invcdf((1:numel(islets_vol))/(numel(islets_vol)),xhat_obs);  scatter(xvals,yvals); 

Traditionally P-P plot is done as follows. On Y axis, expected cumulative probabilities of the theoretical distribution of your choice are plotted, corresponding to your observed values. That is, for example of normal distribution, CDF.NORMAL(var,m,sd) are plotted, where var is your data, m and sd are the parameters estimated from the data or user-specified. On X axis, estimated cumulative proportions of your empirical distribution are plotted. These are the ranks of your data transformed into proportions by one of the methods: rankit (most universal, and preferable for beta), Blom, Tukey or Van der Waerden.

So, if the plotted points lie on X=Y diagonal, that means that the expected theoretical probs and the estimated observed probs coincide which means that your data follows the theoretical distribution.

P-P plot addresses basically the same question as Q-Q plot, P-P being somewhat more sensitive to discrepancies in the middle part of the distribution, Q-Q – in tails. Q-Q is generally preferred in the research community.

Similar Posts:

Rate this post

Leave a Comment