I did the code for my $R^{2}$ (R square) test in MATLAB but it is not working accordingly.
I want to test the Weibull distribution against my raw data, hence I want to do an $R^{2}$ (R square) test. Below is my code, but the results I obtain are negative. I tried with Gamma and Rayleigh to test, but the code was not even working with other distributions.
Could you please verify this code:
sample= N; h1=histfit(sample,30,'weibull'); xdata1 = get(h1(2), 'XData'); ydata1 = get(h1(2), 'YData'); f = fittype('weibull'); [c2, gof] = fit(xdata1',ydata1',f)
Output
c2 =
General model Weibull: c2(x) = a*b*x^(b-1)*exp(-a*x^b) Coefficients (with 95% confidence bounds): a = 6.787e-05 (-0.3653, 0.3654) b = 9.961 (-5429, 5449)
gof =
sse: 4.9879e+07 rsquare: -1.6634 dfe: 98 adjrsquare: -1.6906 rmse: 713.4243
Best Answer
It is pretty clear than an estimation procedure fails here. No half-reasonable fit results to negative $R^2$.
Based on your last comment and doing quite a bit of guess work in the current case you have ydata1
to roughly correspond to points from the line one would get from binning 5000 points in 30 bins. That means that you are pretty surely looking to values that go 200+ in terms of magnitude; a standard fit-a-distro algorithm will not work like that because the scale is quite unlikely for a distributional data. Scale your data (by 1000 or something) and try again. Alternatively look into providing reasonable starting values to your algorithm. See the following example where I use as data the absolute values from a simple $N(0,1)$:
rng(1234); %Fix your random seed sample= abs(randn(5000,1)); h1=histfit(sample,30,'weibull'); xdata1 = get(h1(2), 'XData'); ydata1 = get(h1(2), 'YData'); f = fittype('weibull'); [c, gof] = fit(xdata1',ydata1',f); [c_scaled, gof_scaled] = fit(xdata1',ydata1'/1000,f); >> gof.rsquare ans = -0.7902 >> gof_scaled.rsquare ans = 0.8287
For the unscaled data the estimation is horrible; I get a negative $R^2$ similar to yours. For the scaled data though the estimation is quite reasonable (well as reasonable as fitting a Weinbull distribution to an $|N(0,1)|$ can be). Try to appreciate what you are trying to do not only conceptually but also computationally. Estimation algorithms aren't black-boxes and a trust-region algorithm can only take you so far. Always (really ALWAYS) plot your data so you have a sense to what you are trying to do.