Convert Panda DF list to string

I have a panda data frame. One of the columns contains a list. I want this column to be a separate row.

For example, my list ['one', 'two', 'three'] should just be 'one, two, three'

df['col'] = df['col'].astype(str).apply(lambda x: ', '.join(df['col'].astype(str))) 

gives me ['one, two, three], [' four ',' five ',' six '], where is the second list from the next line. Of course, with millions of lines, this string concatenation is not only wrong, but it also kills my memory.

+16
python pandas
source share
3 answers

Of course, you should not convert to a string before converting a list. Try:

 df['col'].apply(', '.join) 

Also note that apply applies the function to the elements of the series, so using df['col'] in a lambda function is probably not what you need.


Edit : thanks to Yakim for pointing out that there is no need for a lambda function.

+25
source share

When you drop col into str with astype , you get a string representation of the list, brackets and the entire python list. You do not need to do this, just apply join directly:

 import pandas as pd df = pd.DataFrame({ 'A': [['a', 'b', 'c'], ['A', 'B', 'C']] }) # Out[8]: # A # 0 [a, b, c] # 1 [A, B, C] df['Joined'] = df.A.apply(', '.join) # A Joined # 0 [a, b, c] a, b, c # 1 [A, B, C] A, B, C 
+10
source share

You can convert your list to str with astype(str) and then remove the characters ' , [ , ] . Using the @Yakim example:

 In [114]: df Out[114]: A 0 [a, b, c] 1 [A, B, C] In [115]: df.A.astype(str).str.replace('\[|\]|\'', '') Out[115]: 0 a, b, c 1 A, B, C Name: A, dtype: object 

Timing

 import pandas as pd df = pd.DataFrame({'A': [['a', 'b', 'c'], ['A', 'B', 'C']]}) df = pd.concat([df]*1000) In [2]: timeit df['A'].apply(', '.join) 292 µs ± 10.8 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) In [3]: timeit df['A'].str.join(', ') 368 µs ± 24.6 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) In [4]: timeit df['A'].apply(lambda x: ', '.join(x)) 505 µs ± 5.74 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) In [5]: timeit df['A'].str.replace('\[|\]|\'', '') 2.43 ms ± 62.7 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 
+6
source share

All Articles