Pandas concat failing

I am trying to concat foll based dataframes. 2 csv files:

df_a: https://www.dropbox.com/s/slcu7o7yyottujl/df_current.csv?dl=0

df_b: https://www.dropbox.com/s/laveuldraurdpu1/df_climatology.csv?dl=0

Both have the same number and column names. However, when I do this:

pandas.concat([df_a, df_b]) 

I get an error message:

 AssertionError: Number of manager items must equal union of block items # manager items: 20, # tot_items: 21 

How to fix it?

+7
python pandas
source share
3 answers

I believe that this error occurs if the following two conditions are true:

  • Data frames have different columns. (i.e. (df1.columns == df2.columns) is False
  • Columns have duplicate meaning.

Basically, if you agree with data frames with columns [A, B, C] and [B, C, D], it can work to make one series for each distinc column name. Therefore, if I try to join the third part of the data [B, B, C], she does not know which column will be added and ends with fewer columns than he considers necessary.

If your data frames are such that df1.columns == df2.columns, then it will work anyway. So you can join [B, B, C] to [B, B, C], but not [C, B, B], as if the columns were identical, it probably just uses integer indices or what something like that.

+5
source share

You can get around this problem with "manual" concatenation, in which case your

 list_of_dfs = [df_a, df_b] 

And instead of starting

 giant_concat_df = pd.concat(list_of_dfs,0) 

You can use all the data in the dictionary list, and then create a new data frame from these lists (combined with a chain)

 from itertools import chain list_of_dicts = [cur_df.T.to_dict().values() for cur_df in list_of_dfs] giant_concat_df = pd.DataFrame(list(chain(*list_of_dicts))) 
0
source share

Unfortunately, the source files are no longer available, so I can not verify my solution in your case. In my case, an error occurred when:

  • Data frames have two columns with the same name (I had columns ID and ID , which were then converted to lowercase, so they become the same)
  • The types of column values ​​with the same name differ

Here is an example that gives me an error:

 df1 = pd.DataFrame(data=[ ['a', 'b', 'id', 1], ['a', 'b', 'id', 2] ], columns=['A', 'B', 'id', 'id']) df2 = pd.DataFrame(data=[ ['b', 'c', 'id', 1], ['b', 'c', 'id', 2] ], columns=['B', 'C', 'id', 'id']) pd.concat([df1, df2]) >>> AssertionError: Number of manager items must equal union of block items # manager items: 4, # tot_items: 5 

Deleting / renaming one of the columns makes this code work.

0
source share

All Articles