How to effectively combine similar data in Pandas into one giant data frame

Question

How to effectively combine similar data in Pandas into one giant data frame

I have 7000 data frames with columns

Date, X_1 Date, X_2 ...

Each data file has about 2500 lines.

Dates sometimes overlap but are not guaranteed.

I would like to combine them in a form dataframe

 Date X_1 X_2 etc.

I tried applying combine_first 7000 times, but it was very slow, since it had to create 7000 new objects, each of which is slightly larger than the last.

Is there a more efficient way to combine multiple data frames?

+4

python pandas time-series

gbronner Feb 01 '13 at 21:16

source share

2 answers

how about that.

 list_of_dfs = os.listdir(dir_with_data) df = concat(list_of_dfs) df.set_index('Date') df = df.unstack()

0

zach Feb 02 '13 at 20:34

source share

Andy hayden · Accepted Answer · 2013-02-01T22:03:17+0000

Assuming Date is an index, not a column, you can do an “external” join :

 df1.join([df2, df3, ..., df7000], how='outer')

Note. It might be more efficient to pass a DataFrames generator rather than a list.

For instance:

 df1 = pd.DataFrame([[1, 2]], columns=['a', 'b']) df2 = pd.DataFrame([[3, 4]], index=[1], columns=['c', 'd']) df3 = pd.DataFrame([[5, 6], [7, 8]], columns=['e', 'f']) In [4]: df1.join([df2, df3], how='outer') Out[4]: abcdef 0 1 2 NaN NaN 5 6 1 NaN NaN 3 4 7 8

.

If 'Date' is a column, you can use set_index first:

 df1.set_index('Date', inplace=True)

How to effectively combine similar data in Pandas into one giant data frame

More articles: