How to quickly convert a string like "001100" to numpy.array ([0,0,1,1,0,0])?

My string consists of 0 and 1, for example, '00101' . And I want to convert it to a numpy array numpy.array([0,0,1,0,1] .

I use a for loop, for example:

 import numpy as np X = np.zeros((1,5),int) S = '00101' for i in xrange(5): X[0][i] = int(S[i]) 

But since I have many lines and the length of each line is 1024, this method is very slow. Is there a better way to do this?

+4
source share
5 answers

the map should be slightly faster than the comp list:

 import numpy as np arr = np.array(map(int,'00101')) 

Some timings show that it is in a string of 1024 characters:

 In [12]: timeit np.array([int(c) for c in s]) 1000 loops, best of 3: 422 µs per loop In [13]: timeit np.array(map(int,s)) 1000 loops, best of 3: 389 µs per loop 

Just a call list in s and using dtype = int is faster:

 In [20]: timeit np.array(list(s), dtype=int) 1000 loops, best of 3: 329 µs per loop 

Using fromiter and passing dtype=int is faster:

 In [21]: timeit np.fromiter(s,dtype=int) 1000 loops, best of 3: 289 µs per loop 

Borrowing an answer from this using fromstring and uint8, since dtype is the fastest:

 In [54]: timeit np.fromstring(s, 'int8') - 48 100000 loops, best of 3: 4.54 µs per loop 

Even re-creating the name and changing the dtype is still much faster:

 In [71]: %%timeit ....: arr = np.fromstring(s, 'int8') - 48 ....: arr = arr.astype(int) ....: 100000 loops, best of 3: 6.23 µs per loop 

Even significantly faster than Ashwini joins:

 In [76]: timeit np.fromstring(' '.join(s), sep=' ', dtype=int) 10000 loops, best of 3: 62.6 µs per loop 

As @Unutbu commented, np.fromstring(s, 'int8') - 48 not limited to ones and zeros, but will work for all strings consisting of ASCII digits.

+7
source

I think list comprehension will be faster than your usual loop method -

 import numpy as np s = '00101' np.array([int(c) for c in s]) array([0, 0, 1, 0, 1]) 

Comparing time with your method (with a string length of 1024) -

 In [41]: S = '0' * 512 + '1' * 512 In [43]: %%timeit ....: X = np.zeros((1,len(S)),int) ....: for i in range(len(S)): ....: X[0][i] = int(S[i]) ....: 1000 loops, best of 3: 854 µs per loop In [45]: %%timeit ....: Y = np.array([int(c) for c in S]).reshape((1,len(S))) ....: 1000 loops, best of 3: 339 µs per loop 

I changed the shape, so that both arrays have the same shape, but I don’t think you really need reformatting, given the list, the shape of the array you get is (<length of string> ,)

+2
source

Use numpy.fromstring :

 >>> s = '00101' >>> np.fromstring(' '.join(s), sep=' ', dtype=int) array([0, 0, 1, 0, 1]) >>> s = '00101' * 1000 >>> %timeit np.fromiter(s, dtype=int) 100 loops, best of 3: 2.33 ms per loop >>> %timeit np.fromstring(' '.join(s), sep=' ', dtype=int) 1000 loops, best of 3: 499 µs per loop 
+2
source

How to use fromstring method?

 np.fromstring('1, 2', dtype=int, sep=',') 

More here

+1
source

np.array(map(lambda x: int(x), s))

0
source

All Articles