How is the NumPy parameter different from this custom function?

Question

How is the NumPy parameter different from this custom function?

Here's a custom function that allows you to perform decimal increments:

def my_range(start, stop, step): i = start while i < stop: yield i i += step

It works as follows:

 out = list(my_range(0, 1, 0.1)) print(out) [0, 0.1, 0.2, 0.30000000000000004, 0.4, 0.5, 0.6, 0.7, 0.7999999999999999, 0.8999999999999999, 0.9999999999999999]

Now there is nothing surprising in this. It is clear that this is due to floating point inaccuracies and that 0.1 does not have an exact representation in memory. Therefore, these accuracy errors are understandable.

Take numpy on the other side:

 import numpy as np out = np.arange(0, 1, 0.1) print(out) array([ 0. , 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9])

Interestingly, there are no apparent accuracy errors. I thought this could be due to what __repr__ shows, so to confirm this, I tried this:

 x = list(my_range(0, 1.1, 0.1))[-1] print(x.is_integer()) False x = list(np.arange(0, 1.1, 0.1))[-1] print(x.is_integer()) True

So, my function returns an invalid top value (it should be 1.0 , but actually it is 1.0999999999999999 ), but np.arange does it right.

I know Is floating point math broken? but the question is in this question:

How to do it numpy?

+8

python arrays numpy range

cᴏʟᴅsᴘᴇᴇᴅ Aug 27 '17 at 16:34

source share

3 answers

While arange does a few steps over the range, it still has a problem with the representation of the float:

 In [1358]: np.arange(0,1,0.1) Out[1358]: array([ 0. , 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9])

Printing hides it; convert it to a list to see gory details:

 In [1359]: np.arange(0,1,0.1).tolist() Out[1359]: [0.0, 0.1, 0.2, 0.30000000000000004, 0.4, 0.5, 0.6000000000000001, 0.7000000000000001, 0.8, 0.9]

or with another iteration

 In [1360]: [i for i in np.arange(0,1,0.1)] # eg list(np.arange(...)) Out[1360]: [0.0, 0.10000000000000001, 0.20000000000000001, 0.30000000000000004, 0.40000000000000002, 0.5, 0.60000000000000009, 0.70000000000000007, 0.80000000000000004, 0.90000000000000002]

In this case, each item displayed is np.float64 , where, as in the first, this is a float .

+6

hpaulj Aug 27 '17 at 16:54

source share

In addition to the different presentation of lists and arrays, NumPys arange works by multiplying instead of re-adding. This is more like:

 def my_range2(start, stop, step): i = 0 while start+(i*step) < stop: yield start+(i*step) i += 1

Then the output is completely equal to:

 >>> np.arange(0, 1, 0.1).tolist() == list(my_range2(0, 1, 0.1)) True

With repeated additions, you will “accumulate” floating point rounding errors. Multiplication is still affected by rounding, but the error does not accumulate.

As stated in the comments, this is not exactly what is happening. As far as I can see, this is more like:

 def my_range2(start, stop, step): length = math.ceil((stop-start)/step) # The next two lines are mostly so the function really behaves like NumPy does # Remove them to get better accuracy... next = start + step step = next - start for i in range(length): yield start+(i*step)

But I'm not sure if this is right, because much more is happening in NumPy.

+5

Mseifert Aug 27 '17 at 17:08

source share

user2357112 · Accepted Answer · 2017-08-27T16:39:04+0000

The difference in the endpoints is that NumPy computes the length in front of the front, not the ad hoc, because it needs to pre-allocate the array. This can be seen in the _calc_length helper . Instead of stopping when it reaches the final argument, it stops when it reaches a given length.

Calculating the front length does not save you the trouble of an integer step, and you will often get the “wrong” endpoint anyway, for example, numpy.arange(0.0, 2.1, 0.3) :

 In [46]: numpy.arange(0.0, 2.1, 0.3) Out[46]: array([ 0. , 0.3, 0.6, 0.9, 1.2, 1.5, 1.8, 2.1])

It is much safer to use numpy.linspace , where instead of the step size you say how many elements you want and whether you want to include the right endpoint.

It might seem that when calculating elements, NumPy did not experience rounding errors, but simply because of different display logic. NumPy truncates displayed accuracy more aggressively than float.__repr__ . If you use tolist to get the usual list of regular Python scanners (and therefore the usual logic for displaying float ), you can see that NumPy also received a rounding error:

 In [47]: numpy.arange(0, 1, 0.1).tolist() Out[47]: [0.0, 0.1, 0.2, 0.30000000000000004, 0.4, 0.5, 0.6000000000000001, 0.7000000000000001, 0.8, 0.9]

He suffered a slightly different rounding error - for example, in .6 and .7 instead of .8 and .9 - because he also uses another element calculation tool implemented in the fill function for the corresponding dtype.

The implementation of the fill function has the advantage that it uses start + i*step instead of repeatedly adding a step, which avoids the accumulation of errors with each addition. However, it has the disadvantage that (for some convincing reason, I don’t see it), it is reconsidering the step from the first two elements instead of taking the step as an argument, so it may lose a lot of precision at the beginning.

How is the NumPy parameter different from this custom function?

How to do it numpy?

More articles: