Pythonic way to calculate the length of lists in pandas data column

Question

Pythonic way to calculate the length of lists in pandas data column

I have a dataframe like this:

CreationDate 2013-12-22 15:25:02 [ubuntu, mac-osx, syslinux] 2009-12-14 14:29:32 [ubuntu, mod-rewrite, laconica, apache-2.2] 2013-12-22 15:42:00 [ubuntu, nat, squid, mikrotik]

I compute the length of the lists in the CreationDate column and create a new Length column as follows:

 df['Length'] = df.CreationDate.apply(lambda x: len(x))

What gives me this:

  CreationDate Length 2013-12-22 15:25:02 [ubuntu, mac-osx, syslinux] 3 2009-12-14 14:29:32 [ubuntu, mod-rewrite, laconica, apache-2.2] 4 2013-12-22 15:42:00 [ubuntu, nat, squid, mikrotik] 4

Is there a more pythonic way to do this?

+32

python python-2.7 pandas

MYGz Dec 27 '16 at 6:48

source share

2 answers

Here is another option using the apply and lambda functions:

 df['Length'] = df["CreationDate"].apply(lambda l: len(l))

0

Zstack Oct 18 '19 at 15:46

source share

ayhan · Accepted Answer · 2016-12-27T07:03:41+0000

You can use the str accessory for some list operations. In this example

 df['CreationDate'].str.len()

returns the length of each list. See Docs for str.len .

 df['Length'] = df['CreationDate'].str.len() df Out: CreationDate Length 2013-12-22 15:25:02 [ubuntu, mac-osx, syslinux] 3 2009-12-14 14:29:32 [ubuntu, mod-rewrite, laconica, apache-2.2] 4 2013-12-22 15:42:00 [ubuntu, nat, squid, mikrotik] 4

For these operations, vanilla Python is generally faster. pandas handles NaNs. Here is the timing:

 ser = pd.Series([random.sample(string.ascii_letters, random.randint(1, 20)) for _ in range(10**6)]) %timeit ser.apply(lambda x: len(x)) 1 loop, best of 3: 425 ms per loop %timeit ser.str.len() 1 loop, best of 3: 248 ms per loop %timeit [len(x) for x in ser] 10 loops, best of 3: 84 ms per loop %timeit pd.Series([len(x) for x in ser], index=ser.index) 1 loop, best of 3: 236 ms per loop

Pythonic way to calculate the length of lists in pandas data column

More articles: