IIUC, you can do the following:
In [89]: count = df['fruits'].str.split().apply(len).value_counts() count.index = count.index.astype(str) + ' words:' count.sort_index(inplace=True) count Out[89]: 1 words: 2 2 words: 2 3 words: 1 4 words: 1 Name: fruits, dtype: int64
Here we use the vector str.split to divide by spaces, and then apply len to get a count of the number of elements, we can then call value_counts to sum the frequency.
Then we rename the index and sort it to get the desired result.
UPDATE
This can also be done using str.len rather than apply , which should scale better:
In [41]: count = df['fruits'].str.split().str.len() count.index = count.index.astype(str) + ' words:' count.sort_index(inplace=True) count Out[41]: 0 words: 2 1 words: 1 2 words: 3 3 words: 4 4 words: 2 5 words: 1 Name: fruits, dtype: int64
Delay
In [42]: %timeit df['fruits'].str.split().apply(len).value_counts() %timeit df['fruits'].str.split().str.len() 1000 loops, best of 3: 799 ยตs per loop 1000 loops, best of 3: 347 ยตs per loop
For 6K df:
In [51]: %timeit df['fruits'].str.split().apply(len).value_counts() %timeit df['fruits'].str.split().str.len() 100 loops, best of 3: 6.3 ms per loop 100 loops, best of 3: 6 ms per loop
source share