"unstack" pandas column containing multiple row lists

Question

"unstack" pandas column containing multiple row lists

Let's say I have the following Pandas Dataframe:

df = pd.DataFrame({"a" : [1,2,3], "b" : [[1,2],[2,3,4],[5]]}) ab 0 1 [1, 2] 1 2 [2, 3, 4] 2 3 [5]

How would I “unzip” the lists in column “b” to convert it to a dataframe:

  ab 0 1 1 1 1 2 2 2 2 3 2 3 4 2 4 5 3 5

+7

python list pandas dataframe

Alex Feb 02 '17 at 20:57

source share

1 answer

Maxu · Accepted Answer · 2017-02-02T21:04:26+0000

UPDATE: general vector approach - will work for multiple DFs columns:

assuming we have the following DF:

 In [159]: df Out[159]: abc 0 1 [1, 2] 5 1 2 [2, 3, 4] 6 2 3 [5] 7

Decision:

 In [160]: lst_col = 'b' In [161]: pd.DataFrame({ ...: col:np.repeat(df[col].values, df[lst_col].str.len()) ...: for col in df.columns.difference([lst_col]) ...: }).assign(**{lst_col:np.concatenate(df[lst_col].values)})[df.columns.tolist()] ...: Out[161]: abc 0 1 1 5 1 1 2 5 2 2 2 6 3 2 3 6 4 2 4 6 5 3 5 7

Setup:

 df = pd.DataFrame({ "a" : [1,2,3], "b" : [[1,2],[2,3,4],[5]], "c" : [5,6,7] })

NumPy vectorized approach:

 In [124]: pd.DataFrame({'a':np.repeat(df.a.values, df.b.str.len()), 'b':np.concatenate(df.b.values)}) Out[124]: ab 0 1 1 1 1 2 2 2 2 3 2 3 4 2 4 5 3 5

OLD answer:

Try the following:

 In [89]: df.set_index('a', append=True).b.apply(pd.Series).stack().reset_index(level=[0, 2], drop=True).reset_index() Out[89]: a 0 0 1 1.0 1 1 2.0 2 2 2.0 3 2 3.0 4 2 4.0 5 3 5.0

Or a more convenient solution provided by @Boud :

 In [110]: df.set_index('a').b.apply(pd.Series).stack().reset_index(level=-1, drop=True).astype(int).reset_index() Out[110]: a 0 0 1 1 1 1 2 2 2 2 3 2 3 4 2 4 5 3 5

"unstack" pandas column containing multiple row lists

More articles: