Numpy: nditer beginner

I am trying to learn nditer for possible use in speeding up my application. Here I am trying to make a grand change program that will take an array of size 20 and change it to a 5x4 array:

myArray = np.arange(20) def fi_by_fo_100(array): offset = np.array([0, 4, 8, 12, 16]) it = np.nditer([offset, None], flags=['reduce_ok'], op_flags=[['readonly'], ['readwrite','allocate']], op_axes=[None, [0,1,-1]], itershape=(-1, 4, offset.size)) while not it.finished: indices = np.arange(it[0],(it[0]+4), dtype=int) info = array.take(indices) '''Just for fun, we'll perform an operation on data.\ Let shift it to 100''' info = info + 81 it.operands[1][...]=info it.iternext() return it.operands[1] test = fi_by_fo_100(myArray) >>> test array([[ 97, 98, 99, 100]]) 

Obviously, the program rewrites each result on one line. So I'm trying to use the nditer indexing function, but still not a cube.

flags=['reduce_ok','c_iter']it.operands[1][it.index][...]=info =
IndexError: index out of bounds

flags=['reduce_ok','c_iter']it.operands[1][it.iterindex][...]=info =
IndexError: index out of bounds

flags=['reduce_ok','multi_iter']it.operands[1][it.multi_index][...]=info =
IndexError: index out of bounds

it[0][it.multi_index[1]][...]=info =
IndexError: 0-d arrays can't be indexed

... and so on. What am I missing? Thanks in advance.

Bonus Question

I just went through this nice article on nditer . I may be new to Numpy, but this is the first time I've seen Numpy speeds quickly lag behind. I understand that people choose Numpy for its numerical speed and prowess, but iteration is part of it, no? What is the meaning of nditer if it is so slow?

+6
source share
1 answer

It really helps to break things by printing out what happens along the way.

First replace the entire loop with the following:

 i = 0 while not it.finished: i += 1 print i 

It will print 20, not 5. This is because you are doing a 5x4 iteration, not 5x1.

So why is it even close to work? Ok, let's take a closer look at the cycle:

 while not it.finished: print '>', it.operands[0], it[0] indices = np.arange(it[0],(it[0]+4), dtype=int) info = array.take(indices) info = info + 81 it.operands[1][...]=info print '<', it.operands[1], it[1] 

You will see that the first five cycles go through [0 4 8 12 16] five times, generating [[81 82 83 84]] , then [[85 86 87 88]] , etc. And then the next five cycles do the same thing, and again and again.

This is why your c_index solutions c_index not work, because it.index will be in the range 0 to 19, and you have nothing in it.operands[1] .

If you correctly specified multi_index and ignored the columns, you can do this work ... but still you will do 5x4 iteration, just repeating each step 4 times, instead of doing 5x1 null iteration.

Your it.operands[1][...]=info each time replaces the entire output with a 5x1 line. As a rule, you do not need to do anything with it.operands[1] - the whole point of nditer is that you just care about each it[1] , and the final it.operands[1] is the result.

Of course, iterating 5x4 row by row does not make sense. Either 5x4 iterate over the individual values, or 5x1 iterate over the lines.

If you want the first, easiest way to do this is to modify the original array and then just repeat this:

 it = np.nditer([array.reshape(5, -1), None], op_flags=[['readonly'], ['readwrite','allocate']]) for a, b in it: b = a+81 return it.operands[1] 

But of course this is stupid - it's just a slower and more complicated way of writing:

 return array+81 

And it would be a little silly to suggest that "the way to write your own reshape is to call reshape first and then ..."

So you want to iterate over the lines, right?

allocate things a bit by getting rid of allocate and explicitly creating a allocate array to start with:

 outarray = np.zeros((5,4), dtype=array.dtype) offset = np.array([0, 4, 8, 12, 16]) it = np.nditer([offset, outarray], flags=['reduce_ok'], op_flags=[['readonly'], ['readwrite']], op_axes=[None, [0]], itershape=[5]) while not it.finished: indices = np.arange(it[0],(it[0]+4), dtype=int) info = array.take(indices) '''Just for fun, we'll perform an operation on data.\ Let shift it to 100''' info = info + 81 it.operands[1][it.index][...]=info it.iternext() return it.operands[1] 

This is a bit of an abuse of nditer , but at least it is doing the right thing.

Since you are simply doing 1D iteration over the source code and ignoring the second, we really have no good reason to use nditer here. If you need to iterate the lock on multiple arrays, for a, b in nditer([x, y], …) cleaner than iterating over x , and using the index to access y is exactly like for a, b in zip(x, y) outside of numpy . And if you need to nditer over multidimensional arrays, nditer usually cleaner than alternatives. But here all that you really do is iterate over [0, 4, 8, 16, 20] , do something with the result and copy it to another array .

Also, as I mentioned in the comments, if you use iteration in numpy , you usually do something wrong. All the benefits of numpy speed are due to the fact that it runs tight loops in native C / Fortran or lower-level vector operations. After you loop into array s, you actually just make slow Python digits with slightly stronger syntax:

 import numpy as np import timeit def add10_numpy(array): return array + 10 def add10_nditer(array): it = np.nditer([array, None], [], [['readonly'], ['writeonly', 'allocate']]) for a, b in it: np.add(a, 10, b) return it.operands[1] def add10_py(array): x, y = array.shape outarray = array.copy() for i in xrange(x): for j in xrange(y): outarray[i, j] = array[i, j] + 10 return out array myArray = np.arange(100000).reshape(250,-1) for f in add10_numpy, add10_nditer, add10_py: print '%12s: %s' % (f.__name__, timeit.timeit(lambda: f(myArray), number=1)) 

On my system, this prints:

  add10_numpy: 0.000458002090454 add10_nditer: 0.292730093002 add10_py: 0.127345085144 

This means that you do not have to use nditer .

+11
source

All Articles