Count the frequency at which the value in the dataframe column occurs

I have a dataset

|category| cat a cat b cat a 

I would like to return something like (showing unique values ​​and frequency)

 category | freq | cat a 2 cat b 1 
+205
python pandas
Mar 13 '14 at 21:34
source share
20 answers

Use groupby and count :

 In [37]: df = pd.DataFrame({'a':list('abssbab')}) df.groupby('a').count() Out[37]: a aa 2 b 3 s 2 [3 rows x 1 columns] 

See online docs: http://pandas.pydata.org/pandas-docs/stable/groupby.html

Also value_counts() , as @DSM commented, there are many ways to drop a cat here

 In [38]: df['a'].value_counts() Out[38]: b 3 a 2 s 2 dtype: int64 

If you want to add the frequency back to the original framework, use transform to return the aligned index:

 In [41]: df['freq'] = df.groupby('a')['a'].transform('count') df Out[41]: a freq 0 a 2 1 b 3 2 s 2 3 s 2 4 b 3 5 a 2 6 b 3 [7 rows x 2 columns] 
+283
Mar 13 '14 at 21:41
source share

If you want to apply to all columns, you can use:

 df.apply(pd.value_counts) 

This will apply the column-based aggregation function (in this case value_counts) to each column.

+72
Apr 05 '16 at 18:30
source share
 df.category.value_counts() 

This short line of code will give you the desired result.

If there are spaces in the column name, you can use

 df['category'].value_counts() 
+35
Jan 15 '18 at 17:52
source share
 df.apply(pd.value_counts).fillna(0) 

value_counts - returns an object containing the number of unique values

apply - count the frequency in each column. If you set axis=1 , you will get the frequency in each row

fillna (0) - make the output more fashionable. Changed NaN to 0

+16
Apr 09 '17 at 12:19
source share

In 0.18.1 groupby along with count frequency of unique values ​​is not set:

 >>> df a 0 a 1 b 2 s 3 s 4 b 5 a 6 b >>> df.groupby('a').count() Empty DataFrame Columns: [] Index: [a, b, s] 

However, unique values ​​and their frequencies are easily determined using size :

 >>> df.groupby('a').size() a a 2 b 3 s 2 

With df.a.value_counts() , sorted values ​​are returned by default (in descending order, that is, the largest value).

+14
Aug 01 '16 at 16:23
source share

the code:

 df = pd.DataFrame({'a':list('tuhimerisabhain')}) df.a.value_counts() >>> df.a.value_counts() i 3 h 2 a 2 n 1 b 1 m 1 r 1 t 1 e 1 u 1 s 1 
+13
09 Oct '15 at 14:02
source share

Using counting lists and value_values ​​for multiple columns in df

 [my_series[c].value_counts() for c in list(my_series.select_dtypes(include=['O']).columns)] 

stack overflow

+5
Jan 28 '15 at 13:11
source share

I would use this for pandas v0.19.2:

 df.category.value_counts() 
+5
Mar 27 '17 at 18:06
source share

This should work:

 df.groupby('category').size() 
+5
Sep 05 '17 at 9:43 on
source share

If your DataFrame has values ​​of the same type, you can also set return_counts=True to numpy.unique () .

index, counts = np.unique(df.values,return_counts=True)

np.bincount () can be faster if your values ​​are integers.

+4
Oct 04 '17 at 22:06 on
source share

Without any libraries, you can do this instead:

 def to_frequency_table(data): frequencytable = {} for key in data: if key in frequencytable: frequencytable[key] += 1 else: frequencytable[key] = 1 return frequencytable 

Example:

 to_frequency_table([1,1,1,1,2,3,4,4]) >>> {1: 4, 2: 1, 3: 1, 4: 2} 
+3
Mar 27 '17 at 22:05
source share

You can also use:

 df = pd.DataFrame({'a':list('abssbab')}) df['a'].value_counts() 
+2
May 09 '17 at 5:10 am
source share

Use the size () method:

  import pandas as pd print df.groupby['category'].size() #where df is your dataframe 
+2
Feb 13 '18 at 11:39
source share

You can also do this with pandas by passing your columns as categories first, for example, dtype="category" for example

 cats = ['client', 'hotel', 'currency', 'ota', 'user_country'] df[cats] = df[cats].astype('category') 

and then call describe :

 df[cats].describe() 

This will give you a good table of values ​​and a little more :):

  client hotel currency ota user_country count 852845 852845 852845 852845 852845 unique 2554 17477 132 14 219 top 2198 13202 USD Hades US freq 102562 8847 516500 242734 340992 
+1
May 18 '18 at 16:08
source share
 n_values = data.income.value_counts() 

First unique value counter

 n_at_most_50k = n_values[0] 

Second unique value counter

 n_greater_50k = n_values[1] n_values 

Exit:

 <=50K 34014 >50K 11208 Name: income, dtype: int64 

Exit:

 n_greater_50k,n_at_most_50k:- (11208, 34014) 
0
Jun 28 '18 at 9:01
source share

@metatoaster has already pointed this out. Go to Counter . It is fast.

 import pandas as pd from collections import Counter import timeit import numpy as np df = pd.DataFrame(np.random.randint(1, 10000, (100, 2)), columns=["NumA", "NumB"]) 

Timers

 %timeit -n 10000 df['NumA'].value_counts() # 10000 loops, best of 3: 715 Β΅s per loop %timeit -n 10000 df['NumA'].value_counts().to_dict() # 10000 loops, best of 3: 796 Β΅s per loop %timeit -n 10000 Counter(df['NumA']) # 10000 loops, best of 3: 74 Β΅s per loop %timeit -n 10000 df.groupby(['NumA']).count() # 10000 loops, best of 3: 1.29 ms per loop 

Hurrah!

0
Dec 25 '18 at 8:34
source share

Use this code:

 import numpy as np np.unique(df['a'],return_counts=True) 
0
Jan 15 '19 at 11:23
source share
 your data: |category| cat a cat b cat a 

decision:

  df['freq'] = df.groupby('category')['category'].transform('count') df = df.drop_duplicates() 
0
Jan 27 '19 at 7:06
source share

df.category.value_counts () is the easiest way to calculate

0
Feb 15 '19 at 23:13
source share

I believe this should work fine for any list of DataFrame columns.

 def column_list(x): column_list_df = [] for col_name in x.columns: y = col_name, len(x[col_name].unique()) column_list_df.append(y) return pd.DataFrame(column_list_df) column_list_df.rename(columns={0: "Feature", 1: "Value_count"}) 

The column_list function checks the column names and then checks the uniqueness of the values ​​of each column.

0
May 2 '19 at 12:26
source share



All Articles