What is the fastest way in python to build an array c from a list of float tuples?

Context: my Python code passes arrays of 2D vertices to OpenGL.

I tested 2 approaches, one with ctypes, the other with structure, the latter more than 2 times faster.

from random import random points = [(random(), random()) for _ in xrange(1000)] from ctypes import c_float def array_ctypes(points): n = len(points) return n, (c_float*(2*n))(*[u for point in points for u in point]) from struct import pack def array_struct(points): n = len(points) return n, pack("f"*2*n, *[u for point in points for u in point]) 

Any other alternative? Any hint on how to speed up such code (and yes, is this one bottleneck of my code)?

+6
python ctypes pyopengl
source share
5 answers

You can try Cython. For me it gives:

 function usec per loop: Python Cython array_ctypes 1370 1220 array_struct 384 249 array_numpy 336 339 

So Numpy gives only 15% of the advantages of my equipment (an old laptop running WindowsXP), while Cython gives about 35% (without any additional dependency in your distributed code).

If you can cancel your requirement that each point be a tuple of floats and just make the “points” a flattened list of floats:

 def array_struct_flat(points): n = len(points) return pack( "f"*n, *[ coord for coord in points ] ) points = [random() for _ in xrange(1000 * 2)] 

then the resulting output is the same, but time goes on:

 function usec per loop: Python Cython array_struct_flat 157 

Cython could be significantly better than this if someone smarter than me wanted to add static type declarations to the code. (Running "cython -a test.pyx" is invaluable for this, it creates an html file showing you where the slowest (yellow) simple Python is in your code, or python that has been converted to pure C (white). I am distributing the code higher by so many lines, because the coloring is done in a line, so this helps to spread it like that.)

Full Cython instructions: http://docs.cython.org/src/quickstart/build.html

Cython can provide the same performance advantages across your entire code base, and under ideal conditions, with the right static typing, it can increase the speed by ten or a hundred times.

+2
source share

You can pass numpy arrays to PyOpenGL without any overhead. (The data attribute of the numpy array is a buffer that points to the underlying C data structure that contains the same information as the array you create)

 import numpy as np def array_numpy(points): n = len(points) return n, np.array(points, dtype=np.float32) 

On my computer, this is about 40% faster than the struct approach.

+3
source share

There is another idea that I came across. I don’t have time to profile it right now, but in case someone else:

  # untested, but I'm fairly confident it runs # using 'flattened points' list, ie a list of n*2 floats points = [random() for _ in xrange(1000 * 2)] c_array = c_float * len(points * 2) c_array[:] = points 

That is, first we create an array of ctypes, but do not fill it. Then we fill it using the cut note. People are smarter than I say that assigning to such a fragment can help performance. This allows us to pass a list or iteration directly to RHS jobs without using the * iterable syntax that will perform some intermediate iteration disputes. I suspect that this is what is happening in the depths of the creation of the pygmy parties.

Presumably you can just create c_array once, and then just reassign it (the final line in the code above) every time the list of points changes.

There is probably an alternative wording that accepts the original definition of points (a list of (x, y) tuples.) Something like this:

  # very untested, likely contains errors # using a list of n tuples of two floats points = [(random(), random()) for _ in xrange(1000)] c_array = c_float * len(points * 2) c_array[:] = chain(p for p in points) 
+1
source share

If performance is a problem, you don't want to use ctypes arrays with a star (e.g. (ctypes.c_float * size)(*t) ).

In my pack test, the fastest way is to use the array module with an accent address (or using the from_buffer function).

 import timeit repeat = 100 setup="from struct import pack; from random import random; import numpy; from array import array; import ctypes; t = [random() for _ in range(2* 1000)];" print(timeit.timeit(stmt="v = array('f',t); addr, count = v.buffer_info();x = ctypes.cast(addr,ctypes.POINTER(ctypes.c_float))",setup=setup,number=repeat)) print(timeit.timeit(stmt="v = array('f',t);a = (ctypes.c_float * len(v)).from_buffer(v)",setup=setup,number=repeat)) print(timeit.timeit(stmt='x = (ctypes.c_float * len(t))(*t)',setup=setup,number=repeat)) print(timeit.timeit(stmt="x = pack('f'*len(t), *t);",setup=setup,number=repeat)) print(timeit.timeit(stmt='x = (ctypes.c_float * len(t))(); x[:] = t',setup=setup,number=repeat)) print(timeit.timeit(stmt='x = numpy.array(t,numpy.float32).data',setup=setup,number=repeat)) 

The array.array approach is slightly faster than the Jonathan Hartley approach in my test, while the numpy method is about half the speed:

 python3 convert.py 0.004665990360081196 0.004661010578274727 0.026358536444604397 0.0028003649786114693 0.005843495950102806 0.009067213162779808 

The winner of the network is the package.

+1
source share

You can use array (pay attention also to the expression of the generator instead of understanding the list):

 array("f", (u for point in points for u in point)).tostring() 

Another optimization will be to reduce points from the very beginning.

0
source share

All Articles