Numpy around / rint slow compared to astype (int)

So, if I have something like x=np.random.rand(60000)*400-200. iPython %timeitsays:

  • x.astype(int) takes 0.14ms
  • np.rint(x)and np.around(x)take 1.01ms

Note that in cases rintand aroundyou still need to spend an extra 0.14 ms to make the final one astype(int)(assuming that you end up wanting it).

Question: I understand correctly that most modern hardware is capable of performing both operations at the same time. If so, why does numpy take 8 times longer to round?

Be that as it may, I'm not too fussy about the accuracy of arithmetic, but I can't figure out how to use this with numpy (I'm doing dirty biology, not particle physics).

+4
source share
2 answers

np.around(x).astype(int)and x.astype(int)do not give the same values. The first rounds are even (this is the same as ((x*x>=0+0.5) + (x*x<0-0.5)).astype(int)), while the last rounds are rounded to zero. Nonetheless,

y = np.trunc(x).astype(int)
z = x.astype(int)

shows y==z, but the calculation yis much slower. So functions np.truncand are np.aroundslow.

In [165]: x.dtype
Out[165]: dtype('float64')
In [168]: y.dtype
Out[168]: dtype('int64')

So np.trunc(x)rounds to zero from double to double. Then astype(int)should convert double to int64.

Internally, I don't know what python or numpy do, but I know how to do it in C. Let me discuss some hardware. With SSE4.1 you can do round, full, ceiling and trunc with double double use:

_mm_round_pd(a, 0); //round: round even
_mm_round_pd(a, 1); //floor: round towards minus infinity
_mm_round_pd(a, 2); //ceil:  round towards positive infinity
_mm_round_pd(a, 3); //trunc: round towards zero

numpy SSE4.1, SSE4.1, SSE4.1, .

double int64 SSE/AVX AVX512. , double int32 , SSE2:

_mm_cvtpd_epi32(a);  //round double to int32 then expand to int64
_mm_cvttpd_epi32(a); //trunc double to int32 then expand to int64

int64.

, , , int32. python , int32, , trunc int64, . , numpy SSE2, .

, , . :

_mm_cvtps_epi32(a); //round single to int32
_mm_cvttps_epi32(a) //trunc single to int32

int32.

, , SSE2 double int32 . AVX512 int64 , _mm512_cvtpd_epi64(a) _mm512_cvttpd_epi64(a). SSE4.1 round/trunc/floor/ceil float float double, .

+4

@jme , rint around , . , astype , . , . , , . , .

%%timeit
np.int8(x)
10000 loops, best of 3: 165 µs per loop

. -128 127, 8-. .

, , np.intc :

%%timeit
np.int16(x)
10000 loops, best of 3: 186 µs per loop

%%timeit
np.intc(x)
10000 loops, best of 3: 169 µs per loop

%%timeit
np.int0(x)
10000 loops, best of 3: 170 µs per loop

%%timeit
np.int_(x)
10000 loops, best of 3: 188 µs per loop

%%timeit
np.int32(x)
10000 loops, best of 3: 187 µs per loop

%%timeit
    np.trunc(x)
1000 loops, best of 3: 940 µs per loop

:

%%timeit
np.around(x)
1000 loops, best of 3: 1.48 ms per loop

%%timeit
np.rint(x)
1000 loops, best of 3: 1.49 ms per loop

%%timeit
x.astype(int)
10000 loops, best of 3: 188 µs per loop
+1

All Articles