I ran into the following problem of sorting row and column headers.
Here's how to reproduce it:
X =pd.DataFrame(dict(x=np.random.normal(size=100), y=np.random.normal(size=100))) A=pd.qcut(X['x'], [0,0.25,0.5,0.75,1.0])
It shows:
y (-0.567, 0.0321] (0.0321, 0.724] (0.724, 3.478] [-2.58, -0.567] x (-0.228, 0.382] 0.214353 0.113650 -0.013758 0.175768 (-0.843, -0.228] -0.501709 -0.522697 -0.506259 -0.576264 (0.382, 2.662] 1.214640 0.808608 1.515334 0.983807 [-2.315, -0.843] -1.722926 -1.245856 -1.240876 -1.041167
Note how headers are no longer sorted. I am wondering if this is a good way to solve this problem to make interactive work easy.
To further track the problem, follow these steps:
g.unstack().columns
This gives me the following: Index ([(- 0.567, 0.0321], (0.0321, 0.724), (0.724, 3.478), [-2.58, -0.567]], dtype = object)
Now compare this to B.levels:
B.levels Index([[-2.58, -0.567], (-0.567, 0.0321], (0.0321, 0.724], (0.724, 3.478]], dtype=object)
Obviously, the original source code is lost.
Now, to make matters worse, make a multi-level crosstab:
g2 = X.groupby([A,B]).agg('mean') g3 = g2.stack().unstack(-2) HTML(g3.to_html())
It shows something like:
y (-0.567, 0.0321] (0.0321, 0.724] (0.724, 3.478] x (-0.228, 0.382] x 0.214353 0.113650 -0.013758 y -0.293465 0.321836 1.180369 (-0.843, -0.228] x -0.501709 -0.522697 -0.506259 y -0.204811 0.324571 1.167005 (0.382, 2.662] x 1.214640 0.808608 1.515334 y -0.195446 0.161198 1.074532 [-2.315, -0.843] x -1.722926 -1.245856 -1.240876 y -0.392896 0.335471 1.730513
Both row and column labels are not sorted correctly.
Thanks.