The numpy module is a great tool for efficiently storing the memory of python objects, including strings. For ANSI strings in numpy arrays, only 1 byte per character is used.
However, there is one inconvenience. The type of stored objects is no more string, but bytes, which means that they need to be decoded for further use in most cases, which, in turn, means a rather cumbersome code:
>>> import numpy
>>> my_array = numpy.array(['apple', 'pear'], dtype = 'S5')
>>> print("Mary has an {} and a {}".format(my_array[0], my_array[1]))
Mary has an b'apple' and a b'pear'
>>> print("Mary has an {} and a {}".format(my_array[0].decode('utf-8'),
... my_array[1].decode('utf-8')))
Mary has an apple and a pear
This inconvenience can be eliminated using a different data type, for example:
>>> my_array = numpy.array(['apple', 'pear'], dtype = 'U5')
>>> print("Mary has an {} and a {}".format(my_array[0], my_array[1]))
Mary has an apple and a pear
However, this is achieved only by increasing the memory usage by 4 times:
>>> numpy.info(my_array)
class: ndarray
shape: (2,)
strides: (20,)
itemsize: 20
aligned: True
contiguous: True
fortran: True
data pointer: 0x1a5b020
byteorder: little
byteswap: False
type: <U5
Is there a solution that combines the advantages of both efficient memory allocation and convenient use for ANSI strings?