Pandas top-n report in group and summary

I am trying to summarize a data frame by grouping one dimension d1 and reporting summary statistics for each d1 element. In particular, I am interested in the superscript (index and values) for a number of indicators. what i would like to create is a string for each d1 element.

Let's say I have two dimensions d1, d2 and 4 metrics m1, m2, m3, m4

1) what is the proposed method of grouping by d1 and find the upper n d2 and the metric value for each of the metrics m1 - m4.

in Wes Python's book for data analysis, he suggests (p. 35)

def get_top1000(group):
 return group.sort_index(by='births', ascending=False)[:1000]
grouped = names.groupby(['year', 'sex'])
top1000 = grouped.apply(get_top1000)

Is this still the recommended way (I'm only interested in what they say about the 5th vertices of 1000 and for several indicators) 2) Now the next problem is that I want to rotate the top 5 (i.e. I there is one row for each element d1)

therefore, the final data frame should look like this for sizes d1, d2 and metric m1: index d1 and columns for the top 5 values ​​of d2 and the corresponding values ​​of m1

d1 d2-1 d2-2 d2-3 d2-4 d2-5 m1-1 m1-2 m1-3 m1-4 m1-5

....

therefore, to rotate, I need to create a ranking along d2 (i.e. 1 to 5 is the field of my columns). It would be easy if I always had 5 records, but sometimes for a given value of d1 there were less than 5 elements of d2.

can anyone suggest how to add ranking to the group so that i have the correct column index to perform the rotation

+4
1

, , :

N = 1000
names = my_fake_data_loader()
grouped = names.groupby(['year', 'sex'])
grouped.apply(lambda g: g.sort_index(by='births', ascending=False).head(N))

1000 .

+7

All Articles