Copy with numeric string

The numpy module is a great tool for efficiently storing the memory of python objects, including strings. For ANSI strings in numpy arrays, only 1 byte per character is used.

However, there is one inconvenience. The type of stored objects is no more string, but bytes, which means that they need to be decoded for further use in most cases, which, in turn, means a rather cumbersome code:

>>> import numpy
>>> my_array = numpy.array(['apple', 'pear'], dtype = 'S5')
>>> print("Mary has an {} and a {}".format(my_array[0], my_array[1]))
Mary has an b'apple' and a b'pear'
>>> print("Mary has an {} and a {}".format(my_array[0].decode('utf-8'),
... my_array[1].decode('utf-8')))
Mary has an apple and a pear

This inconvenience can be eliminated using a different data type, for example:

>>> my_array = numpy.array(['apple', 'pear'], dtype = 'U5')
>>> print("Mary has an {} and a {}".format(my_array[0], my_array[1]))
Mary has an apple and a pear

However, this is achieved only by increasing the memory usage by 4 times:

>>> numpy.info(my_array)
class:  ndarray
shape:  (2,)
strides:  (20,)
itemsize: 20
aligned:  True
contiguous:  True
fortran:  True
data pointer: 0x1a5b020
byteorder:  little
byteswap:  False
type: <U5

Is there a solution that combines the advantages of both efficient memory allocation and convenient use for ANSI strings?

+4
2

decode, astype ( , ). , .

In [538]: x=my_array.astype('U');"Mary has an {} and a {}".format(x[0],x[1])
Out[538]: 'Mary has an apple and a pear'

format, "b" .

fooobar.com/questions/805494/... - , Formatter, format_field. - convert_field. .

In [562]: def makeU(astr):
    return astr.decode('utf-8')
   .....: 

In [563]: class MyFormatter(string.Formatter):
    def convert_field(self, value, conversion):
        if 'q'== conversion:
            return makeU(value)
        else:
            return super(MyFormatter, self).convert_field(value, conversion)
   .....:         

In [564]: MyFormatter().format("Mary has an {!q} and a {!q}",my_array[0],my_array[1])
Out[564]: 'Mary has an apple and a pear'

:

In [642]: "Mary has an {1} and a {0} or {1}".format(*my_array.astype('U'))
Out[642]: 'Mary has an pear and a apple or pear'

( " " ) format . , unicode:

In [643]: "Mary has an {1} and a {0} or {1}".format(*uarray.astype('U'))
Out[643]: 'Mary has an pear and a apple or pear'

np.char , . decode :

In [644]: "Mary has a {1} and an {0}".format(*np.char.decode(my_array))
Out[644]: 'Mary has a pear and an apple'

( , unicode).

, np.char .

+2

:

>>> my_array = numpy.array(['apple', 'pear'], dtype = 'S5')

" ":

>>> print("Mary has an {} and a {}".format(*map(lambda b: b.decode('utf-8'), my_array)))
Mary has an apple and a pear

:

import string
class ByteFormatter(string.Formatter):
    def __init__(self, decoder='utf-8'):
        self.decoder=decoder

    def format_field(self, value, spec):
        if isinstance(value, bytes):
            return value.decode(self.decoder)
        return super(ByteFormatter, self).format_field(value, spec)   

>>> print(ByteFormatter().format("Mary has an {} and a {}", *my_array))
Mary has an apple and a pear
+4

All Articles