Why is a numpy vectorized function apparently called overtime?

Question

Why is a numpy vectorized function apparently called overtime?

I have an array of numpy objects containing several index lists:

>>> idxLsts = np.array([[1], [0, 2]], dtype=object)

I define a vector function to add a value to each list:

 >>> idx = 99 >>> f = np.vectorize(lambda idxLst: idxLst.append(idx))

I am calling a function. I'm not interested in the return value, just a side effect.

 >>> f(idxLsts) array([None, None], dtype=object)

Index 99 was added twice to the first list. What for? I'm at a dead end.

 >>> idxLsts array([[1, 99, 99], [0, 2, 99]], dtype=object)

With other idxLsts values this does not happen:

 >>> idxLsts = np.array([[1, 2], [0, 2, 4]], dtype=object) >>> f(idxLsts) array([None, None], dtype=object) >>> idxLsts array([[1, 2, 99], [0, 2, 4, 99]], dtype=object)

My suspicion is related to the documentation, which reads: "Define a vectorized function that takes a sequence of numpy objects or numpy arrays as inputs and returns a numpy array as output. Pyfunc on consecutive tuples of input arrays such as the python mapping function, except that uses numpy broadcast rules. "

+6

python vectorization numpy

RVS Oct 26 '12 at 23:42

source share

1 answer

unutbu · Accepted Answer · 2012-10-27T01:05:57+0000

From vectorize docstring:

 The data type of the output of `vectorized` is determined by calling the function with the first element of the input. This can be avoided by specifying the `otypes` argument.

And from the code:

  theout = self.thefunc(*newargs)

This is an additional call to thefunc , used to determine the type of output. This is why the first item gets two 99 .

This behavior is observed in the second case:

 import numpy as np idxLsts = np.array([[1, 2], [0,2,4]], dtype = object) idx = 99 f = np.vectorize(lambda x: x.append(idx)) f(idxLsts) print(idxLsts)

gives

 [[1, 2, 99, 99] [0, 2, 4, 99]]

You can use np.frompyfunc instead of np.vectorize :

 import numpy as np idxLsts = np.array([[1, 2], [0,2,4]], dtype = object) idx = 99 f = np.frompyfunc(lambda x: x.append(idx), 1, 1) f(idxLsts) print(idxLsts)

gives

 [[1, 2, 99] [0, 2, 4, 99]]

Why is a numpy vectorized function apparently called overtime?

More articles: