Better to use a tuple or numpy array to store coordinates

Question

Better to use a tuple or numpy array to store coordinates

I am porting a C ++ scientific application to python, and since I'm new to python, some problems come to my mind:

1) I define a class that will contain the coordinates (x, y). These values will be available several times, but they will not be read until the class is instantiated. Is it better to use a tuple or numpy array, both in memory and in time mode?

2) In some cases, these coordinates will be used to construct a complex number estimated by a complex function, and the real part of this function will be used. Assuming there is no way to separate the real and complex parts of this function, and the real part should be used at the end, maybe it is better to use complex numbers directly to store (x, y)? How bad is overhead with converting from complex to real in python? C ++ code does a lot of these conversions, and this is a big recession in this code.

3) It will also be necessary to perform some coordinate transformations, and for coordinates, the x and y values will be available separately, the transformation will be performed, and the result will be returned. Coordinate transformations are defined in the complex plane, so is it even faster to use the x and y components directly than to rely on complex variables?

thanks

+7

python arrays numpy tuples complex-numbers

Ivan Apr 1 '10 at 21:17

source share

2 answers

A numpy array with an extra dimension is more densely used in memory and at least as fast! As the ttt tles array; complex numbers, at least as good or even better, including for your third question. By the way, you may have noticed that - while the questions asked later than yours got answers to many questions - you put in pairs: part of the reason, of course, is that asking three questions in the question calls the defendants . Why not just ask one question per question? It's not like you are being accused of issues or something else, you know ...! -)

+3

Alex martelli Apr 2 '10 at 2:50

source share

unutbu · Accepted Answer · 2010-04-02T03:37:33+0000

In terms of memory consumption, numpy arrays are more compact than Python tuples. The numpy array uses a single contiguous block of memory. All elements of a numpy array must have a declared type (for example, a 32-bit or 64-bit float.) A Python tuple does not necessarily use a continuous block of memory, and tuple elements can be arbitrary Python objects that usually consume more memory than numeric numeric types.

Thus, this problem is a freehand win for numpy (provided that the elements of the array can be stored as a numeric numeric type).

As for speed, I think the choice comes down to the question: "Can you design your code?"

That is, you can express your calculations as operations performed on all arrays by elements.

If the code can be vectorized, then numpy is likely to be faster than Python tuples. (The only case I could imagine where this might not be the case is if you have a lot of very small tuples. In this case, the overhead of forming numpy arrays and the one-time cost of importing numpy can drown out the advantage of vectorization.)

An example of code that could not be vectorized would be if your calculation included looking at, say, the first complex number in the array z , performing a calculation that produces an integer index idx , and then obtaining z[idx] , performing the calculation on that number which produces the next index idx2 , then extracts z[idx2] , etc. This type of calculation may not be vectorized. In this case, you can also use Python tuples, since you cannot use the power of numpy.

I would not worry about the speed of access to the real / imaginary parts of a complex number. I assume that the vectorization issue is likely to determine which method runs faster. (Although, by the way, numpy can convert an array of complex numbers into their real parts by simply going through a complex array, skipping each other float and viewing the result as a float. Moreover, the syntax is dead: if z is a complex numpy array, then z.real is the real parts of a floating-point array, which should be much faster than Python's clean approach to using understanding attribute lists: [z.real for z in zlist] .)

Just out of curiosity, what is the reason for your porting of C ++ code to Python?

Better to use a tuple or numpy array to store coordinates

More articles: