Here is a demo set of data points that are drawn from a larger sample. I fit a Weibull distribution in R using the {fitdistrplus}
package, and get back reasonable results for shape and scale parameters.
# in R: library(fitdistrplus) x <- c(4836.6, 823.6, 3131.7, 1343.4, 709.7, 610.6, 3034.2, 1973, 7358.5, 265, 4590.5, 5440.4, 4613.7, 4763.1, 115.3, 5385.1, 6398.1, 8444.6, 2397.1, 3259.7, 307.5, 4607.4, 6523.7, 600.3, 2813.5, 6119.8, 6438.8, 2799.1, 2849.8, 5309.6, 3182.4, 705.5, 5673.3, 2939.9, 2631.8, 5002.1, 1967.3, 2810.4, 2948, 6904.8) fitdist(x, "weibull")
Result:
Fitting of the distribution ' weibull ' by maximum likelihood Parameters: estimate Std. Error shape 1.501077 0.2003799 scale 3912.816005 430.4170971
Then I try to do the same thing using scipy.stats. I use the weibull_min
function. (I've seen recommendations to use exponweib
with constraint a=1
and can confirm results are the same.)
# in python import numpy as np import pandas as pd from scipy import stats x = [4836.6, 823.6, 3131.7, 1343.4, 709.7, 610.6, 3034.2, 1973, 7358.5, 265, 4590.5, 5440.4, 4613.7, 4763.1, 115.3, 5385.1, 6398.1, 8444.6, 2397.1, 3259.7, 307.5, 4607.4, 6523.7, 600.3, 2813.5, 6119.8, 6438.8, 2799.1, 2849.8, 5309.6, 3182.4, 705.5, 5673.3, 2939.9, 2631.8, 5002.1, 1967.3, 2810.4, 2948, 6904.8] stats.weibull_min.fit(x)
Here are the results:
shape, loc, scale = (0.1102610560437356, 115.29999999999998, 3.428664764594809)
This is clearly a terrible fit to the data, as I can see if I just sample from this fitted distribution:
import matplotlib.pyplot as plt import seaborn as sns c, loc, scale = stats.weibull_min.fit(x) x = stats.weibull_min.rvs(c, loc, scale, size=1000) sns.distplot(x)
Why is the fit so bad here?
I am aware that by constraining the loc parameter, I can recreate the results from {fitdistrplus}
, but why should this be necessary? Shouldn't the unconstrained fit be more likely to overfit the data than to dramatically, and ridiculously under-fit it?
# recreate results from R's {fitdistrplus} stats.weibull_min.fit(x, floc=0)
Best Answer
This was addressed in https://github.com/scipy/scipy/issues/11806. We discussed that the optimizer wasn't finding a good local minimum. A better fit can be found by providing a better initial guess for the location loc=0
(note: this is different from fixing the location with floc=0
) or using a different optimizer.
Similar Posts:
- Solved – Are these MLE estimates biased
- Solved – Kolmogorov-Smirnov Test in Python weird result and interpretation
- Solved – How to estimate the parameters of Frechet distribution in R
- Solved – Comparing approaches of MLE estimates of a Weibull distribution
- Solved – How to compute expected value of arbitrary distributions in scipy