Expand pandas DataFrame column in multiple rows

If I have a DataFrame such that:

 pd.DataFrame( {"name" : "John", "days" : [[1, 3, 5, 7]] }) 

gives the following structure:

  days name 0 [1, 3, 5, 7] John 

How to expand it to the next?

  days name 0 1 John 1 3 John 2 5 John 3 7 John 
+10
python pandas
source share
6 answers

You can use df.itertuples to iterate through each line and use list comprehension to convert the data to the desired form:

 import pandas as pd df = pd.DataFrame( {"name" : ["John", "Eric"], "days" : [[1, 3, 5, 7], [2,4]]}) result = pd.DataFrame([(d, tup.name) for tup in df.itertuples() for d in tup.days]) print(result) 

profitability

  0 1 0 1 John 1 3 John 2 5 John 3 7 John 4 2 Eric 5 4 Eric 

Divakar's solution , using_repeat , is the fastest:

 In [48]: %timeit using_repeat(df) 1000 loops, best of 3: 834 µs per loop In [5]: %timeit using_itertuples(df) 100 loops, best of 3: 3.43 ms per loop In [7]: %timeit using_apply(df) 1 loop, best of 3: 379 ms per loop In [8]: %timeit using_append(df) 1 loop, best of 3: 3.59 s per loop 

Here is the setting used for the above test:

 import numpy as np import pandas as pd N = 10**3 df = pd.DataFrame( {"name" : np.random.choice(list('ABCD'), size=N), "days" : [np.random.randint(10, size=np.random.randint(5)) for i in range(N)]}) def using_itertuples(df): return pd.DataFrame([(d, tup.name) for tup in df.itertuples() for d in tup.days]) def using_repeat(df): lens = [len(item) for item in df['days']] return pd.DataFrame( {"name" : np.repeat(df['name'].values,lens), "days" : np.concatenate(df['days'].values)}) def using_apply(df): return (df.apply(lambda x: pd.Series(x.days), axis=1) .stack() .reset_index(level=1, drop=1) .to_frame('day') .join(df['name'])) def using_append(df): df2 = pd.DataFrame(columns = df.columns) for i,r in df.iterrows(): for e in r.days: new_r = r.copy() new_r.days = e df2 = df2.append(new_r) return df2 
+10
source share

There's something with NumPy here -

 lens = [len(item) for item in df['days']] df_out = pd.DataFrame( {"name" : np.repeat(df['name'].values,lens), "days" : np.hstack(df['days']) }) 

As pointed out by @unutbu solution np.concatenate(df['days'].values) will be faster than np.hstack(df['days']) .

It uses a loop understanding to extract the lengths of each element of 'days' , which should be minimal in time.

Run Example -

 >>> df days name 0 [1, 3, 5, 7] John 1 [2, 4] Eric >>> lens = [len(item) for item in df['days']] >>> pd.DataFrame( {"name" : np.repeat(df['name'].values,lens), ... "days" : np.hstack(df['days']) ... }) days name 0 1 John 1 3 John 2 5 John 3 7 John 4 2 Eric 5 4 Eric 
+7
source share

A 'native' pandas solution - we expand the column in a row and then join it based on the index:

 import pandas as pd #import x2 = x.days.apply(lambda x: pd.Series(x)).unstack() #make an unstackeded series, x2 x.drop('days', axis = 1).join(pd.DataFrame(x2.reset_index(level=0, drop=True))) #drop the days column, join to the x2 series 
+4
source share

another solution:

 In [139]: (df.apply(lambda x: pd.Series(x.days), axis=1) .....: .stack() .....: .reset_index(level=1, drop=1) .....: .to_frame('day') .....: .join(df['name']) .....: ) Out[139]: day name 0 1 John 0 3 John 0 5 John 0 7 John 
+1
source share

Perhaps one way or another:

 df2 = pd.DataFrame(columns = df.columns) for i,r in df.iterrows(): for e in r.days: new_r = r.copy() new_r.days = e df2 = df2.append(new_r) df2 
+1
source share

Thanks to Divakar's solution , he wrote this as a wrapper function to align a column, handling np.nan and DataFrames with multiple columns

 def flatten_column(df, column_name): repeat_lens = [len(item) if item is not np.nan else 1 for item in df[column_name]] df_columns = list(df.columns) df_columns.remove(column_name) expanded_df = pd.DataFrame(np.repeat(df.drop(column_name, axis=1).values, repeat_lens, axis=0), columns=df_columns) flat_column_values = np.hstack(df[column_name].values) expanded_df[column_name] = flat_column_values expanded_df[column_name].replace('nan', np.nan, inplace=True) return expanded_df 
0
source share

All Articles