How to write fast log-sum-exp in Cython and Weave?

I am considering speeding up the log-sum-exp operation (using the max trick operation) from Python code. I use Python 2.7 for Windows 8. I have compiled a comparison of implementations using the Numpy, Scipy, Numba, Cython, Weave, and numexpr functions, which can be viewed here in nbviewer .

I expected my versions of Cython and Weave to be the fastest, as they will be closest to their own code. But in reality they are slower than other versions.

How to make these versions as fast as possible?

Edit: take an initial notepad, add a max trick in all methods to make the comparison less simple, and closer to my real need.

+7
python numpy cython
source share
1 answer

The explicit vector version (SSE) c is about 2.5x faster than any alternative that you posted on my machine (~ 360 us vs 150 us) for float32 data. I do not have numba, so I could not try this.

http://nbviewer.ipython.org/github/rmcgibbo/logsumexp/blob/master/Accelerating%20log-sum-exp.ipynb

Please note that this is only with float32. One of the drawbacks of explicit SSE code is that it is very data-specific, and I did not take the effort to write a double-precision version.

The complete source code for implementing SSE (BSD) with a simple setup.py installer is located at https://github.com/rmcgibbo/logsumexp/tree/master

%timeit scipy.misc.logsumexp(a) 10.4467 1000 loops, best of 3: 363 µs per loop 10.4467144498 %timeit lse_weave(a) 1000 loops, best of 3: 352 µs per loop 10.4467 %timeit lse_numexpr(a) 1000 loops, best of 3: 360 µs per loop 10.4467162773 %timeit lse_cython(a) 1000 loops, best of 3: 361 µs per loop 10.4467163086 %timeit sselogsumexp.logsumexp(a) # <--- my version 10000 loops, best of 3: 149 µs per loop 
+5
source share

All Articles