Suppose I have an array: adata=array([0.5, 1.,2.,3.,6.,10.]) , And I want to calculate the probability of registering the Weibull distribution of this array taking into account the parameters [5.,1.5] and [5.1,1.6] . I never thought that I needed to write my own Weibull probability density functions for this task, as already provided in scipy.stat.distributions . So this should do it:
from scipy import stats from numpy import * adata=array([0.5, 1.,2.,3.,6.,10.]) def wb2LL(p, x):
AND:
>>> wb2LL(array([[5.,1.5],[5.1,1.6]]).T[...,newaxis], adata) array([-14.43743911, -14.68835298])
Or I reinvent the wheel and write a new PDF Weibull function, for example:
wbp=lambda p, x: p[1]/p[0]*((x/p[0])**(p[1]-1))*exp(-((x/p[0])**p[1])) def wb2LL1(p, x):
AND:
>>> wb2LL1(array([[5.,1.5],[5.1,1.6]]).T[...,newaxis], adata) array([-14.43743911, -14.68835298])
Admittedly, I always take it for granted that if something is already in scipy , it should be very well optimized, and re-inventing the wheel will rarely accelerate. But here it is unexpected: if I timeit , 100000 calls to wb2LL1(array([[5.,1.5],[5.1,1.6]])[...,newaxis], adata) takes 2.156s, whereas 100000 calls to wb2LL(array([[5.,1.5],[5.1,1.6]])[...,newaxis], adata) takes 5.219s, the built-in stats.weibull_min.pdf() more than twice as slow.
Checking the source code python_path/sitepackage/scipy/stat/distributions.py did not provide a simple answer, at least for me. If anything, from it, I would expect stats.weibull_min.pdf() be almost as fast as wbp() .
Corresponding source code: line 2999-3033:
class frechet_r_gen(rv_continuous): """A Frechet right (or Weibull minimum) continuous random variable. %(before_notes)s See Also -------- weibull_min : The same distribution as `frechet_r`. frechet_l, weibull_max Notes ----- The probability density function for `frechet_r` is:: frechet_r.pdf(x, c) = c * x**(c-1) * exp(-x**c) for ``x > 0``, ``c > 0``. %(example)s """ def _pdf(self, x, c): return c*pow(x,c-1)*exp(-pow(x,c)) def _logpdf(self, x, c): return log(c) + (c-1)*log(x) - pow(x,c) def _cdf(self, x, c): return -expm1(-pow(x,c)) def _ppf(self, q, c): return pow(-log1p(-q),1.0/c) def _munp(self, n, c): return special.gamma(1.0+n*1.0/c) def _entropy(self, c): return -_EULER / c - log(c) + _EULER + 1 frechet_r = frechet_r_gen(a=0.0, name='frechet_r', shapes='c') weibull_min = frechet_r_gen(a=0.0, name='weibull_min', shapes='c')
So the question is: are stats.weibull_min.pdf() really slower? If so, how?