Preserve data type of Dataframe column after external merge

When you combine two data frames with an index that combines with some values, but not at all using "external" merging, python / pandas automatically adds Null (NaN) values ​​to fields that cannot match. Fine this is normal behavior, but it changes the data type. This is a problem because now you need to repeat what data types the columns should have.

fillna or dropna () does not seem to save data types immediately after merging. Or do I need a table structure?

Normally, I will run numpy np.where (field.isnull (), etc.), but that means it works for all columns.

What is the workaround for this?

+8
python pandas
source share
1 answer

I don’t think there is a really elegant / efficient way to do this. You can do this by tracking the original data types and then throwing the columns after the merge, for example:

import pandas as pd # all types are originally ints df = pd.DataFrame({'a': [1]*10, 'b': [1, 2] * 5, 'c': range(10)}) df2 = pd.DataFrame({'e': [1, 1], 'd': [1, 2]}) # track the original dtypes orig = df.dtypes.to_dict() orig.update(df2.dtypes.to_dict()) # join the dataframe joined = df.join(df2, how='outer') # columns with nans are now float dtype print joined.dtypes # replace nans with suitable int value joined.fillna(-1, inplace=True) # re-cast the columns as their original dtype joined_orig_types = joined.apply(lambda x: x.astype(orig[x.name])) print joined_orig_types.dtypes 
+2
source share

All Articles