Python floating point float

I need to implement a dynamic programming algorithm in order to solve the Traveling Salesman problem in time, which is superior to Brute Force Search for calculating the distances between points. To do this, I need to index the subtasks by size, and the value of each subtask will be a float (tour length). However, when storing the array in memory, it will take about 6 GB of RAM, if I use python floats (which actually has double precision), and therefore, in order to try to halve this amount (I only have 4 GB of RAM), I will need to use single precision floats. However, I do not know how I can get single precision floats in Python (I use Python 3). Can someone please tell me where I can find them (I could not find out a lot about this on the Internet). Thanks.

EDIT . I noticed that numpy also has a float16 type, which will save even more memory. The distance between the points is about 10,000, and 25 unique points, and my answer should be to the nearest integer. Will float16 provide sufficient accuracy or do I need to use float32?

+5
source share
2 answers

As a first step, you should use a NumPy array to store your data instead of a Python list.

As you rightly noted, the Python float uses internal double precision, and the double precision value that underlies floating Python can be represented in 8 bytes. But on a 64-bit machine with CPython for Python, a Python float takes up a full 24 bytes of memory: 8 bytes for a double-precision base value, 8 bytes for a pointer to the type of object, and 8 bytes for reference counting (used to collect garbage). In Python, there is no equivalent to the β€œprimitive” Java types or the .NET β€œvalue” types β€” all in a box. This makes the semantics of the language simpler, but means that objects tend to be thicker.

Now, if we create a list of float objects in Python, the additional overhead of the list itself is added: one 8-byte pointer to an object in Python float (a 64-bit machine is still assumed here). Thus, in the general case, a list of n Python float objects will cost you more than 32n bytes of memory. On a 32-bit machine, everything is a little better, but not so much: our float objects will accept 16 bytes each, and we will use list pointers of 20n bytes of memory for a float list of length n . (Caveat: this analysis does not quite work if your list refers to the same Python float object from multiple list indices, but this is not a particularly common case.)

In contrast, an NumPy array of double precision n floats (using the NumPy float64 dtype) stores its data in a "packed" format in a single data block of 8n bytes, so taking into account the metadata of the array, the total memory requirement will be slightly less than 8n bytes.

Conclusion: simply by moving from a Python list to a NumPy array, you will reduce memory requirements by about 4 times. If this is still not enough, then it might make sense to consider reducing the accuracy from double to one precision (NumPy float32 dtype), if that matches your accuracy. NumPy float16 data type takes only 2 bytes per float, but writes only about three decimal digits of accuracy; I suspect that it will be almost useless for the application you are describing.

+8
source

You can try the c_float type from the c_float standard library. Alternatively, if you can install additional packages, you can try the numpy package. It includes type float32 .

+2
source

All Articles