Copy with numeric string

Question

Copy with numeric string

The numpy module is a great tool for efficiently storing the memory of python objects, including strings. For ANSI strings in numpy arrays, only 1 byte per character is used.

However, there is one inconvenience. The type of stored objects is no more string, but bytes, which means that they need to be decoded for further use in most cases, which, in turn, means a rather cumbersome code:

>>> import numpy
>>> my_array = numpy.array(['apple', 'pear'], dtype = 'S5')
>>> print("Mary has an {} and a {}".format(my_array[0], my_array[1]))
Mary has an b'apple' and a b'pear'
>>> print("Mary has an {} and a {}".format(my_array[0].decode('utf-8'),
... my_array[1].decode('utf-8')))
Mary has an apple and a pear

This inconvenience can be eliminated using a different data type, for example:

>>> my_array = numpy.array(['apple', 'pear'], dtype = 'U5')
>>> print("Mary has an {} and a {}".format(my_array[0], my_array[1]))
Mary has an apple and a pear

However, this is achieved only by increasing the memory usage by 4 times:

>>> numpy.info(my_array)
class:  ndarray
shape:  (2,)
strides:  (20,)

itemsize: 20

aligned:  True
contiguous:  True
fortran:  True
data pointer: 0x1a5b020
byteorder:  little
byteswap:  False
type: <U5

Is there a solution that combines the advantages of both efficient memory allocation and convenient use for ANSI strings?

+4

python string python-3.x numpy

Roman 25 . '15 14:57

2

:

>>> my_array = numpy.array(['apple', 'pear'], dtype = 'S5')

" ":

>>> print("Mary has an {} and a {}".format(*map(lambda b: b.decode('utf-8'), my_array)))
Mary has an apple and a pear

:

import string
class ByteFormatter(string.Formatter):
    def __init__(self, decoder='utf-8'):
        self.decoder=decoder

    def format_field(self, value, spec):
        if isinstance(value, bytes):
            return value.decode(self.decoder)
        return super(ByteFormatter, self).format_field(value, spec)   

>>> print(ByteFormatter().format("Mary has an {} and a {}", *my_array))
Mary has an apple and a pear

+4

dawg 25 . '15 15:47

hpaulj · Accepted Answer · 2015-08-25T15:40:26+0000

decode, astype ( , ). , .

In [538]: x=my_array.astype('U');"Mary has an {} and a {}".format(x[0],x[1])
Out[538]: 'Mary has an apple and a pear'

format, "b" .

fooobar.com/questions/805494/... - , Formatter, format_field. - convert_field. .

In [562]: def makeU(astr):
    return astr.decode('utf-8')
   .....: 

In [563]: class MyFormatter(string.Formatter):
    def convert_field(self, value, conversion):
        if 'q'== conversion:
            return makeU(value)
        else:
            return super(MyFormatter, self).convert_field(value, conversion)
   .....:         

In [564]: MyFormatter().format("Mary has an {!q} and a {!q}",my_array[0],my_array[1])
Out[564]: 'Mary has an apple and a pear'

:

In [642]: "Mary has an {1} and a {0} or {1}".format(*my_array.astype('U'))
Out[642]: 'Mary has an pear and a apple or pear'

( " " ) format . , unicode:

In [643]: "Mary has an {1} and a {0} or {1}".format(*uarray.astype('U'))
Out[643]: 'Mary has an pear and a apple or pear'

np.char , . decode :

In [644]: "Mary has a {1} and an {0}".format(*np.char.decode(my_array))
Out[644]: 'Mary has a pear and an apple'

( , unicode).

, np.char .

Copy with numeric string

More articles: