Why does the Python plugin raise a ValueError for very long inputs?

On my Python 2.7.9 on x64, I see the following behavior:

>>> float("10"*(2**28)) inf >>> float("10"*(2**29)) Traceback (most recent call last): File "<stdin>", line 1, in <module> ValueError: could not convert string to float: 10101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010 >>> float("0"*(2**33)) 0.0 >>> float("0." + "0"*(2**32)) Traceback (most recent call last): File "<stdin>", line 1, in <module> ValueError: could not convert string to float: 0.000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 

If there is no deeper rationale that I am missing, this violates the least surprise. When I got a ValueError on "10"*(2**29) , I realized that this was just a restriction on very long lines, but then "0"*(2**33) worked. What's happening? Can anyone justify why this behavior is not a POLA error (if possible, relatively irrelevant)?

+8
python
source share
2 answers

Since zeros are skipped when displaying the base

I like to look at my favorite reference implementation for such questions.


Evidence

In the comments, Casevh has a great intuition. Here's the corresponding code :

 for (bits_per_char = -1; n; ++bits_per_char) n >>= 1; /* n <- total # of bits needed, while setting p to end-of-string */ while (_PyLong_DigitValue[Py_CHARMASK(*p)] < base) ++p; *str = p; /* n <- # of Python digits needed, = ceiling(n/PyLong_SHIFT). */ n = (p - start) * bits_per_char + PyLong_SHIFT - 1; if (n / bits_per_char < p - start) { PyErr_SetString(PyExc_ValueError,"long string too large to convert"); return NULL; 

Where p initially set to a pointer to your string. If we look at the PyLongDigitValue table, we will see that 0 is explicitly mapped to 0.

Python does a lot of extra work to optimize the conversion of certain bases ( there is an interesting comment on 200 lines about converting binary !), Which is why it does a lot of work to bring out the correct base in the first place. In this case; we can skip zeros when displaying the base, so they are not taken into account when calculating the overflow.

Indeed, we check how many bits are needed to store this float, but python is smart enough to remove zero leading zeros from it. I do not see anything in the float function documents guaranteeing this behavior in all implementations. They sinisterly declare

Convert a string or number to a floating point number, if possible.


When it does not work

When you write

  float("0." + "0"*(2**32)) 

It stops database parsing at an early stage - all other zeros are considered in calculating bit lengths and contribute to raising ValueError


Similar analytic tricks

Here's a similar case in the float class, where we find that spaces are ignored (and an interesting comment by the authors about their intention with this design choice)

 while (Py_ISSPACE(*s)) s++; /* We don't care about overflow or underflow. If the platform * supports them, infinities and signed zeroes (on underflow) are * fine. */ 
+4
source share

In the case of float ("10" * (2 ** 29)), you will convert the string to a floating-point value, which is likely to exceed the maximum value that a float in Python can have.

While for the case of float ("0" * (2 ** 33)), you convert the string to a floating-point value of 0.0, regardless of how many times you multiply it by.

The error did not occur due to the restriction on very long lines, but because of the restriction on the maximum float value.

Feel free to check this. What is the maximum float in Python?

+2
source share

All Articles