Solved – scipy.stats failing to fit Weibull distribution unless location parameter is constrained

Here is a demo set of data points that are drawn from a larger sample. I fit a Weibull distribution in R using the {fitdistrplus} package, and get back reasonable results for shape and scale parameters.

# in R:  library(fitdistrplus) x <- c(4836.6, 823.6, 3131.7, 1343.4, 709.7, 610.6,         3034.2, 1973, 7358.5, 265, 4590.5, 5440.4, 4613.7, 4763.1,         115.3, 5385.1, 6398.1, 8444.6, 2397.1, 3259.7, 307.5, 4607.4,         6523.7, 600.3, 2813.5, 6119.8, 6438.8, 2799.1, 2849.8, 5309.6,         3182.4, 705.5, 5673.3, 2939.9, 2631.8, 5002.1, 1967.3, 2810.4,        2948, 6904.8)  fitdist(x, "weibull") 

Result:

Fitting of the distribution ' weibull ' by maximum likelihood  Parameters:          estimate  Std. Error shape    1.501077   0.2003799 scale 3912.816005 430.4170971 

Then I try to do the same thing using scipy.stats. I use the weibull_min function. (I've seen recommendations to use exponweib with constraint a=1 and can confirm results are the same.)

# in python  import numpy as np import pandas as pd from scipy import stats  x = [4836.6, 823.6, 3131.7, 1343.4, 709.7, 610.6,       3034.2, 1973, 7358.5, 265, 4590.5, 5440.4, 4613.7, 4763.1,       115.3, 5385.1, 6398.1, 8444.6, 2397.1, 3259.7, 307.5, 4607.4,       6523.7, 600.3, 2813.5, 6119.8, 6438.8, 2799.1, 2849.8, 5309.6,       3182.4, 705.5, 5673.3, 2939.9, 2631.8, 5002.1, 1967.3, 2810.4,      2948, 6904.8]  stats.weibull_min.fit(x) 

Here are the results:

shape, loc, scale = (0.1102610560437356, 115.29999999999998, 3.428664764594809) 

This is clearly a terrible fit to the data, as I can see if I just sample from this fitted distribution:

import matplotlib.pyplot as plt import seaborn as sns  c, loc, scale = stats.weibull_min.fit(x) x = stats.weibull_min.rvs(c, loc, scale, size=1000) sns.distplot(x) 

Why is the fit so bad here?

I am aware that by constraining the loc parameter, I can recreate the results from {fitdistrplus}, but why should this be necessary? Shouldn't the unconstrained fit be more likely to overfit the data than to dramatically, and ridiculously under-fit it?

# recreate results from R's {fitdistrplus} stats.weibull_min.fit(x, floc=0) 

This was addressed in https://github.com/scipy/scipy/issues/11806. We discussed that the optimizer wasn't finding a good local minimum. A better fit can be found by providing a better initial guess for the location loc=0 (note: this is different from fixing the location with floc=0) or using a different optimizer.

Similar Posts:

Rate this post

Leave a Comment