I am trying to reformat a table based on counts in different columns.
df = pd.DataFrame({'Number': [1, 2, 3, 4, 5], 'X' : ['X1', 'X2', 'X3', 'X3', 'X3'], 'Y' : ['Y2','Y1','Y1','Y1', 'Y2'], 'Z' : ['Z3','Z1','Z1','Z2','Z1']})
Number X Y Z
0 1 X1 Y2 Z3
1 2 X2 Y1 Z1
2 3 X3 Y1 Z1
3 4 X3 Y1 Z2
4 5 X3 Y2 Z1
I want the top line to be sorted by frequency X (first X3, because it appears twice). Then, for each X value, count the frequency of your Y and Z values ββand print the one that appears most.
X3 X2 X1
Y Y1 Y1 Y2
Z Z1 Z1 Z3
So far I have code that can sort
import pandas as pd
df = pd.DataFrame({'Number': [1, 2, 3, 4, 5], 'X' : ['X1', 'X2', 'X3', 'X3', 'X3'], 'Y' : ['Y2','Y1','Y1','Y1', 'Y2'], 'Z' : ['Z3','Z1','Z1','Z2','Z1']})
pivot = df.pivot_table(index='X', columns=['Y', 'Z'], values = 'Number', aggfunc='count')
pivot.fillna(0, inplace=True)
pivot['sum'] = pivot.sum(axis=1)
pivot.sort('sum', ascending=False, inplace=True)
df = pivot[:5].transpose()
df.to_csv('sorted.csv')
and outputs:
Y Z X3 X1 X2
Y1 Z1 1 0 1
Y1 Z2 1 0 0
Y2 Z1 1 0 0
Y2 Z3 0 1 0
sum 3 1 1
But this is still not what I am looking for, can someone help me with this? Thank!