Sort_by broken in pandas> = 0.18.0?

Question

Sort_by broken in pandas> = 0.18.0?

I start with a data frame like

print(df) int float _i 1 2 2.000000e+00 1 3 3 3.000000e+00 3 2 3 4.000000e+00 2 4 -9223372036854775808 -1.797693e+308 4 0 -9223372036854775808 1.000000e+00 0

If I use sort_values to sort by two columns, I get the output that you see below. Therefore, sort_values does nothing. If I have only one column name, this works, and the way I use it worked in previous versions of pandas. Are there any changes to pandas that I don't know about?

 print(df.sort_values(["int", "float"])) int float _i 1 2 2.000000e+00 1 3 3 3.000000e+00 3 2 3 4.000000e+00 2 4 -9223372036854775808 -1.797693e+308 4 0 -9223372036854775808 1.000000e+00 0

In pandas 0.17.0 I get:

 print(df.sort_values(["int", "float"])) int float _i 4 -9223372036854775808 -1.797693e+308 4 0 -9223372036854775808 1.000000e+00 0 1 2 2.000000e+00 1 3 3 3.000000e+00 3 2 3 4.000000e+00 2

+7

python pandas

rocksportrocker Dec 19 '16 at 14:55

source share

1 answer

Alex Luis Arias · Answer 1 · 2017-01-19T07:41:44+0000

I can get the sort you want for your case by calling it as follows:

 print(df.sort_values(by=["int", "float"], na_position='first')) int float _i 3 -9223372036854775808 -1.797693e+308 4 4 -9223372036854775808 1.000000e+00 0 0 2 2.000000e+00 1 1 3 3.000000e+00 3 2 3 4.000000e+00 2

However, I'm not sure why sorting behaves differently between the two versions. I checked the source code of GitHub and I did not see any changes to the sort_values function between the two versions. It is possible that something deeper in the code has changed.

Code that does the sorting:

 2968 if len(by) > 1: 2968 from pandas.core.groupby import _lexsort_indexer 2969 2970 def trans(v): 2971 if com.needs_i8_conversion(v): 2972 return v.view('i8') 2973 return v 2974 keys = [] 2975 for x in by: 2976 k = self[x].values 2977 if k.ndim == 2: 2978 raise ValueError('Cannot sort by duplicate column %s' % str(x)) 2979 keys.append(trans(k)) 2980 indexer = _lexsort_indexer(keys, orders=ascending, 2981 na_position=na_position) 2982 indexer = com._ensure_platform_int(indexer) 3004 new_data = self._data.take(indexer, axis=self._get_block_manager_axis(axis), 3005 convert=False, verify=False)

Maybe something has changed with _lexsort_indexer () or self._data.take ().

Sort_by broken in pandas> = 0.18.0?

More articles: