Count the frequency at which the value in the dataframe column occurs

Question

Count the frequency at which the value in the dataframe column occurs

I have a dataset

|category| cat a cat b cat a

I would like to return something like (showing unique values and frequency)

 category | freq | cat a 2 cat b 1

+205

python pandas

yoshiserry Mar 13 '14 at 21:34

source share

20 answers

If you want to apply to all columns, you can use:

 df.apply(pd.value_counts)

This will apply the column-based aggregation function (in this case value_counts) to each column.

+72

Arran Cudbard-Bell Apr 05 '16 at 18:30

source share

 df.category.value_counts()

This short line of code will give you the desired result.

If there are spaces in the column name, you can use

 df['category'].value_counts()

+35

Satyajit Dhawale Jan 15 '18 at 17:52

source share

 df.apply(pd.value_counts).fillna(0)

value_counts - returns an object containing the number of unique values

apply - count the frequency in each column. If you set axis=1 , you will get the frequency in each row

fillna (0) - make the output more fashionable. Changed NaN to 0

+16

Roman Kazakov Apr 09 '17 at 12:19

source share

In 0.18.1 groupby along with count frequency of unique values is not set:

 >>> df a 0 a 1 b 2 s 3 s 4 b 5 a 6 b >>> df.groupby('a').count() Empty DataFrame Columns: [] Index: [a, b, s]

However, unique values and their frequencies are easily determined using size :

 >>> df.groupby('a').size() a a 2 b 3 s 2

With df.a.value_counts() , sorted values are returned by default (in descending order, that is, the largest value).

+14

Vidhya G Aug 01 '16 at 16:23

source share

the code:

 df = pd.DataFrame({'a':list('tuhimerisabhain')}) df.a.value_counts() >>> df.a.value_counts() i 3 h 2 a 2 n 1 b 1 m 1 r 1 t 1 e 1 u 1 s 1

+13

Vamshi G 09 Oct '15 at 14:02

source share

Using counting lists and value_values for multiple columns in df

 [my_series[c].value_counts() for c in list(my_series.select_dtypes(include=['O']).columns)]

stack overflow

+5

jetpackdata.com Jan 28 '15 at 13:11

source share

I would use this for pandas v0.19.2:

 df.category.value_counts()

+5

Selah Mar 27 '17 at 18:06

source share

This should work:

 df.groupby('category').size()

+5

Kilzrus Sep 05 '17 at 9:43 on

source share

If your DataFrame has values of the same type, you can also set return_counts=True to numpy.unique () .

index, counts = np.unique(df.values,return_counts=True)

np.bincount () can be faster if your values are integers.

+4

user666 Oct 04 '17 at 22:06 on

source share

Without any libraries, you can do this instead:

 def to_frequency_table(data): frequencytable = {} for key in data: if key in frequencytable: frequencytable[key] += 1 else: frequencytable[key] = 1 return frequencytable

Example:

 to_frequency_table([1,1,1,1,2,3,4,4]) >>> {1: 4, 2: 1, 3: 1, 4: 2}

+3

Timz95 Mar 27 '17 at 22:05

source share

You can also use:

 df = pd.DataFrame({'a':list('abssbab')}) df['a'].value_counts()

+2

Ammar Malik May 09 '17 at 5:10 am

source share

Use the size () method:

  import pandas as pd print df.groupby['category'].size() #where df is your dataframe

+2

Omniverse10 Feb 13 '18 at 11:39

source share

You can also do this with pandas by passing your columns as categories first, for example, dtype="category" for example

 cats = ['client', 'hotel', 'currency', 'ota', 'user_country'] df[cats] = df[cats].astype('category')

and then call describe :

 df[cats].describe()

This will give you a good table of values and a little more :):

  client hotel currency ota user_country count 852845 852845 852845 852845 852845 unique 2554 17477 132 14 219 top 2198 13202 USD Hades US freq 102562 8847 516500 242734 340992

+1

tsando May 18 '18 at 16:08

source share

 n_values = data.income.value_counts()

First unique value counter

 n_at_most_50k = n_values[0]

Second unique value counter

 n_greater_50k = n_values[1] n_values

Exit:

 <=50K 34014 >50K 11208 Name: income, dtype: int64

Exit:

 n_greater_50k,n_at_most_50k:- (11208, 34014)

0

RAHUL KUMAR Jun 28 '18 at 9:01

source share

@metatoaster has already pointed this out. Go to Counter . It is fast.

 import pandas as pd from collections import Counter import timeit import numpy as np df = pd.DataFrame(np.random.randint(1, 10000, (100, 2)), columns=["NumA", "NumB"])

Timers

 %timeit -n 10000 df['NumA'].value_counts() # 10000 loops, best of 3: 715 µs per loop %timeit -n 10000 df['NumA'].value_counts().to_dict() # 10000 loops, best of 3: 796 µs per loop %timeit -n 10000 Counter(df['NumA']) # 10000 loops, best of 3: 74 µs per loop %timeit -n 10000 df.groupby(['NumA']).count() # 10000 loops, best of 3: 1.29 ms per loop

Hurrah!

0

dragonfire_007 Dec 25 '18 at 8:34

source share

Use this code:

 import numpy as np np.unique(df['a'],return_counts=True)

0

Harshit Oberoi Jan 15 '19 at 11:23

source share

 your data: |category| cat a cat b cat a

decision:

  df['freq'] = df.groupby('category')['category'].transform('count') df = df.drop_duplicates()

0

Rahul Jain Jan 27 '19 at 7:06

source share

df.category.value_counts () is the easiest way to calculate

0

RJ Feb 15 '19 at 23:13

source share

I believe this should work fine for any list of DataFrame columns.

 def column_list(x): column_list_df = [] for col_name in x.columns: y = col_name, len(x[col_name].unique()) column_list_df.append(y) return pd.DataFrame(column_list_df) column_list_df.rename(columns={0: "Feature", 1: "Value_count"})

The column_list function checks the column names and then checks the uniqueness of the values of each column.

0

djoguns May 2 '19 at 12:26

source share

EdChum · Accepted Answer · 2014-03-13 21:41

Use groupby and count :

 In [37]: df = pd.DataFrame({'a':list('abssbab')}) df.groupby('a').count() Out[37]: a aa 2 b 3 s 2 [3 rows x 1 columns]

See online docs: http://pandas.pydata.org/pandas-docs/stable/groupby.html

Also value_counts() , as @DSM commented, there are many ways to drop a cat here

 In [38]: df['a'].value_counts() Out[38]: b 3 a 2 s 2 dtype: int64

If you want to add the frequency back to the original framework, use transform to return the aligned index:

 In [41]: df['freq'] = df.groupby('a')['a'].transform('count') df Out[41]: a freq 0 a 2 1 b 3 2 s 2 3 s 2 4 b 3 5 a 2 6 b 3 [7 rows x 2 columns]

Count the frequency at which the value in the dataframe column occurs

Timers

More articles: