How to quickly convert a string like "001100" to numpy.array ([0,0,1,1,0,0])?

Question

How to quickly convert a string like "001100" to numpy.array ([0,0,1,1,0,0])?

My string consists of 0 and 1, for example, '00101' . And I want to convert it to a numpy array numpy.array([0,0,1,0,1] .

I use a for loop, for example:

 import numpy as np X = np.zeros((1,5),int) S = '00101' for i in xrange(5): X[0][i] = int(S[i])

But since I have many lines and the length of each line is 1024, this method is very slow. Is there a better way to do this?

+4

python types numpy type-conversion format

stigmj Aug 22 '15 at 10:23

source share

5 answers

Padraic cunningham · Answer 1 · 2015-08-22T10:35:14+0000

the map should be slightly faster than the comp list:

 import numpy as np arr = np.array(map(int,'00101'))

Some timings show that it is in a string of 1024 characters:

 In [12]: timeit np.array([int(c) for c in s]) 1000 loops, best of 3: 422 µs per loop In [13]: timeit np.array(map(int,s)) 1000 loops, best of 3: 389 µs per loop

Just a call list in s and using dtype = int is faster:

 In [20]: timeit np.array(list(s), dtype=int) 1000 loops, best of 3: 329 µs per loop

Using fromiter and passing dtype=int is faster:

 In [21]: timeit np.fromiter(s,dtype=int) 1000 loops, best of 3: 289 µs per loop

Borrowing an answer from this using fromstring and uint8, since dtype is the fastest:

 In [54]: timeit np.fromstring(s, 'int8') - 48 100000 loops, best of 3: 4.54 µs per loop

Even re-creating the name and changing the dtype is still much faster:

 In [71]: %%timeit ....: arr = np.fromstring(s, 'int8') - 48 ....: arr = arr.astype(int) ....: 100000 loops, best of 3: 6.23 µs per loop

Even significantly faster than Ashwini joins:

 In [76]: timeit np.fromstring(' '.join(s), sep=' ', dtype=int) 10000 loops, best of 3: 62.6 µs per loop

As @Unutbu commented, np.fromstring(s, 'int8') - 48 not limited to ones and zeros, but will work for all strings consisting of ASCII digits.

Anand s kumar · Answer 2 · 2015-08-22T10:26:02+0000

I think list comprehension will be faster than your usual loop method -

 import numpy as np s = '00101' np.array([int(c) for c in s]) array([0, 0, 1, 0, 1])

Comparing time with your method (with a string length of 1024) -

 In [41]: S = '0' * 512 + '1' * 512 In [43]: %%timeit ....: X = np.zeros((1,len(S)),int) ....: for i in range(len(S)): ....: X[0][i] = int(S[i]) ....: 1000 loops, best of 3: 854 µs per loop In [45]: %%timeit ....: Y = np.array([int(c) for c in S]).reshape((1,len(S))) ....: 1000 loops, best of 3: 339 µs per loop

I changed the shape, so that both arrays have the same shape, but I don’t think you really need reformatting, given the list, the shape of the array you get is (<length of string> ,)

Ashwini chaudhary · Answer 3 · 2015-08-22T10:50:31+0000

Use numpy.fromstring :

 >>> s = '00101' >>> np.fromstring(' '.join(s), sep=' ', dtype=int) array([0, 0, 1, 0, 1]) >>> s = '00101' * 1000 >>> %timeit np.fromiter(s, dtype=int) 100 loops, best of 3: 2.33 ms per loop >>> %timeit np.fromstring(' '.join(s), sep=' ', dtype=int) 1000 loops, best of 3: 499 µs per loop

zom-pro · Answer 4 · 2015-08-22T10:29:37+0000

How to use fromstring method?

 np.fromstring('1, 2', dtype=int, sep=',')

More here

Timmy · Answer 5 · 2015-08-22T10:29:08+0000

np.array(map(lambda x: int(x), s))

How to quickly convert a string like "001100" to numpy.array ([0,0,1,1,0,0])?

More articles: