I have pandas.DataFrameone containing numerous columns. I'm only interested in one of these columns ("names") whose type is "object". I want to answer three questions about this column:
What value (s) appears most often, excluding nan values?
How many values match these criteria (number of values in answer # 1)?
How often do these values appear?
I started with a big data frame (df). The column that interests me is called "names." First, I used collection.Counter to get the number of entries for each unique value in the "names" column:
In [52]: cntr = collections.Counter([r for i, r in df['names'].dropna().iteritems()])
Out[52]: Counter({'Erk': 118,
'James': 120,
'John': 126,
'Michael': 129,
'Phil': 117,
'Ryan': 126})
Then I converted the counter back to a data frame:
In [53]: df1 = pd.DataFrame.from_dict(cntr, orient='index').reset_index()
In [54]: df1 = df1.rename(columns={'index':'names', 0:'cnt'})
This gave me a pandas framework containing:
In [55]: print (type(df1), df1)
Out[55]: <class 'pandas.core.frame.DataFrame'>
names cnt
0 Erk 118
1 James 120
2 Phil 117
3 John 126
4 Michael 122
5 Ryan 126
. :
# 1 = [, ]
# 2 = 2
№ 3 = 126
, , , , dataframe dataframe.