Pandas - create a data frame with counting and frequency of elements

Starting from the following data frame df:

df = pd.DataFrame({'node':[1,2,3,3,3,5,5],'lang':['it','en','ar','ar','es','uz','es']})

I am trying to build a structure:

    node     langs   lfreq
0      1      [it]     [1]
1      2      [en]     [1]
2      3  [ar, es]  [2, 1]
3      5  [uz, es]  [1, 1]

thus basically grouping the elements langand frequency per node into one line through lists. What i have done so far:

# Getting the unique langs / node
a = df.groupby('node')['lang'].unique().reset_index(name='langs')

# Getting the frequency of lang / node
b = df.groupby('node')['lang'].value_counts().reset_index(name='lfreq')
c = b.groupby('node')['lfreq'].unique().reset_index(name='lfreq')

and then merge into node:

d = pd.merge(a,c,on='node')

After these operations, I got the following:

    node     langs   lfreq
0      1      [it]     [1]
1      2      [en]     [1]
2      3  [ar, es]  [2, 1]
3      5  [uz, es]     [1]

As you can see, the last line has only one [1]occurrence of a frequency of two [uz, es]instead of a list [1,1], as expected. Is there a way to perform the analysis in a more concise way to get the desired result?

+4
source share
3 answers

, ( ) 40 , - .

df.groupby(['node','lang'])['lang'].count()

node  lang
1     it      1
2     en      1
3     ar      2
      es      1
5     es      1
      uz      1

, (zen python), , , pandas/numpy (ints floats), .

- pandas, groupby, -, , , , . , , , .

+2

agg tolist()

df = pd.DataFrame({'node':[1,2,3,3,3,5,5],'lang':['it','en','ar','ar','es','uz','es']})
# Getting the unique langs / node
a = df.groupby('node')['lang'].unique().reset_index(name='langs')

# Getting the frequency of lang / node
b = df.groupby('node')['lang'].value_counts().reset_index(name='lfreq')

c = b.groupby('node')['lfreq'].unique().reset_index(name='lfreq')

c = b.groupby('node').agg({'lfreq': lambda x: x.tolist()}).reset_index()

d = pd.merge(a,c,on='node')

:

   node     langs   lfreq
0     1      [it]     [1]
1     2      [en]     [1]
2     3  [ar, es]  [2, 1]
3     5  [uz, es]  [1, 1]
+3

apply np.unique return_counts=True:

df = pd.DataFrame({'node':[1,2,3,3,3,5,5],'lang':['it','en','ar','ar','es','uz','es']})
print df
  lang  node
0   it     1
1   en     2
2   ar     3
3   ar     3
4   es     3
5   uz     5
6   es     5

a = df.groupby('node')['lang'].apply(lambda x: np.unique(x, return_counts=True))
                              .reset_index(name='tup')

#split tuples
a[['langs','lfreq']] = a['tup'].apply(pd.Series)
#filter columns
print a[['node','langs','lfreq']]
   node     langs   lfreq
0     1      [it]     [1]
1     2      [en]     [1]
2     3  [ar, es]  [2, 1]
3     5  [es, uz]  [1, 1]
+2

All Articles