Python: sort function breaks when nan is present

sorted([2, float('nan'), 1]) returns [2, nan, 1]

(at least for the implementation of Activestate Python 3.1.)

I understand that nan is a strange object, so I won’t be surprised if it appears in random places as a result of sorting. But it also ruined the sorting for non-nan numbers in the container, which is really unexpected.

I asked the question about max , and based on this, I understand why sort works like this. But should this be considered a mistake?

The documentation simply says: β€œReturn a new sorted list [...]” without specifying any details.

EDIT: Now I agree that this does not violate the IEEE standard. However, I think this is a mistake from any point of view of common sense. Even Microsoft, which often acknowledges its mistakes often, acknowledged this as a mistake and fixed it in the latest version: http://connect.microsoft.com/VisualStudio/feedback/details/363379/bug-in-list-double-sort-in- list-which-contains-double-nan .

Anyway, I got @khachik's answer:

 sorted(list_, key = lambda x : float('-inf') if math.isnan(x) else x) 

I suspect this leads to a performance hit compared to the language that does this by default, but at least it works (banning any errors that I presented).

+22
python sorting math nan
Nov 21 '10 at 20:05
source share
5 answers

The previous answers are helpful, but may not be clear as to the root of the problem.

In any language, sorting applies the specified ordering, defined by the comparison function or in some other way, over the domain of input values. For example, less than, aka operator <, can be used everywhere if and only if less than the appropriate order over the input values.

But this is especially NOT true for floating point values ​​and less: "NaN is disordered: it is not equal, more or less than everything, including itself." ( Clear prose from the GNU C manual , but applies to all modern IEEE754 floating point )

Thus, possible solutions:

  • first remove NaN by making a correctly defined input domain via <(or another sorting function used)
  • define a custom comparison function (aka predicate) that does determine the order for NaN, for example, less than any number or more than any number.

Any approach can be used in any language.

In practice, given python, I would prefer to remove NaNs, if you care, about maximum performance, or if removing NaNs is the desired behavior in context.

Otherwise, you can use the appropriate predicate function via "cmp" in older versions of python or through this and functools.cmp_to_key() . The latter, of course, is a bit uncomfortable than removing NaNs in the first place. When defining this predicate function, care must be taken to avoid performance degradation.

+11
Aug 23 '11 at 17:35
source share
β€” -

The problem is that there is no correct order if the list contains NAN, because the sequence a1, a2, a3, ..., an is sorted if a1 <= a2 <= a3 <= ... <= an. If any of these values ​​is NAN, then the sorted property breaks, because for all a, a <= NAN and NAN <= a, both are false.

+7
Nov 21 2018-10-21
source share

I am not sure about the error, but the workaround could be as follows:

 sorted( (2, 1, float('nan')), lambda x,y: x is float('nan') and -1 or (y is float('nan') and 1 or cmp(x,y))) 

that leads to:

 ('nan', 1, 2) 

Or remove nan before sorting or anything else.

+5
Nov 21 '10 at 20:12
source share

IEEE754 is a standard that defines floating point operations in this instance. This standard defines the operation of comparing operands, at least one of which is NaN, as an error. Therefore, this is not a mistake. You need to deal with NaN before working with an array.

+3
Nov 21 '10 at 20:10
source share

Assuming you want to save NaNs and arrange them as the lowest "values", this is a workaround that works with unique nano, unique numpy nan, numerical and non-numerical objects:

 def is_nan(x): return (x is np.nan or x != x) list_ = [2, float('nan'), 'z', 1, 'a', np.nan, 4, float('nan')] sorted(list_, key = lambda x : float('-inf') if is_nan(x) else x) # [nan, nan, nan, 1, 2, 4, 'a', 'z'] 
0
May 24 '17 at 9:25
source share



All Articles