Numpy around / rint slow compared to astype (int)

Question

Numpy around / rint slow compared to astype (int)

So, if I have something like x=np.random.rand(60000)*400-200. iPython %timeitsays:

x.astype(int) takes 0.14ms
np.rint(x)and np.around(x)take 1.01ms

Note that in cases rintand aroundyou still need to spend an extra 0.14 ms to make the final one astype(int)(assuming that you end up wanting it).

Question: I understand correctly that most modern hardware is capable of performing both operations at the same time. If so, why does numpy take 8 times longer to round?

Be that as it may, I'm not too fussy about the accuracy of arithmetic, but I can't figure out how to use this with numpy (I'm doing dirty biology, not particle physics).

+4

c assembly python numpy sse

dan-man Dec 02 '14 at 14:48

source share

2 answers

Z boson · Answer 1 · 2014-12-03T14:06:40+0000

np.around(x).astype(int)and x.astype(int)do not give the same values. The first rounds are even (this is the same as ((x*x>=0+0.5) + (x*x<0-0.5)).astype(int)), while the last rounds are rounded to zero. Nonetheless,

y = np.trunc(x).astype(int)
z = x.astype(int)

shows y==z, but the calculation yis much slower. So functions np.truncand are np.aroundslow.

In [165]: x.dtype
Out[165]: dtype('float64')
In [168]: y.dtype
Out[168]: dtype('int64')

So np.trunc(x)rounds to zero from double to double. Then astype(int)should convert double to int64.

Internally, I don't know what python or numpy do, but I know how to do it in C. Let me discuss some hardware. With SSE4.1 you can do round, full, ceiling and trunc with double double use:

_mm_round_pd(a, 0); //round: round even
_mm_round_pd(a, 1); //floor: round towards minus infinity
_mm_round_pd(a, 2); //ceil:  round towards positive infinity
_mm_round_pd(a, 3); //trunc: round towards zero

numpy SSE4.1, SSE4.1, SSE4.1, .

double int64 SSE/AVX AVX512. , double int32 , SSE2:

_mm_cvtpd_epi32(a);  //round double to int32 then expand to int64
_mm_cvttpd_epi32(a); //trunc double to int32 then expand to int64

int64.

, , , int32. python , int32, , trunc int64, . , numpy SSE2, .

, , . :

_mm_cvtps_epi32(a); //round single to int32
_mm_cvttps_epi32(a) //trunc single to int32

int32.

, , SSE2 double int32 . AVX512 int64 , _mm512_cvtpd_epi64(a) _mm512_cvttpd_epi64(a). SSE4.1 round/trunc/floor/ceil float float double, .

atomh33ls · Answer 2 · 2014-12-02T15:59:21+0000

@jme , rint around , . , astype , . , . , , . , .

%%timeit
np.int8(x)
10000 loops, best of 3: 165 µs per loop

. -128 127, 8-. .

, , np.intc :

%%timeit
np.int16(x)
10000 loops, best of 3: 186 µs per loop

%%timeit
np.intc(x)
10000 loops, best of 3: 169 µs per loop

%%timeit
np.int0(x)
10000 loops, best of 3: 170 µs per loop

%%timeit
np.int_(x)
10000 loops, best of 3: 188 µs per loop

%%timeit
np.int32(x)
10000 loops, best of 3: 187 µs per loop

%%timeit
    np.trunc(x)
1000 loops, best of 3: 940 µs per loop

:

%%timeit
np.around(x)
1000 loops, best of 3: 1.48 ms per loop

%%timeit
np.rint(x)
1000 loops, best of 3: 1.49 ms per loop

%%timeit
x.astype(int)
10000 loops, best of 3: 188 µs per loop

Numpy around / rint slow compared to astype (int)

More articles: