How to replace NaN with the sum of a string in Pandas DatatFrame

Question

How to replace NaN with the sum of a string in Pandas DatatFrame

I am trying to replace the NaN in specific columns with the row sum in a Pandas DataFrame. The following are sample data:

Items| Estimate1| Estimate2| Estimate3| Item1| NaN | NaN | 8 Item2| NaN | NaN | 5.5|

I hope that ratings 1 and 2 will be 8 and 5.5 for points 1 and 2, respectively.

So far I have tried to use df.fillna(df.sum(), inplace=True) , but there are no changes in the DataFrame. Can someone help me fix my code or recommend the right way to do this?

+3

python python-3.x pandas dataframe

Avagut Apr 6 '15 at 19:57

source share

2 answers

Alternatively, you can also use apply with the lambda expression as follows:

 df.apply(lambda row: row.fillna(row.sum()), axis=1)

gives the desired result

  Estimate1 Estimate2 Estimate3 Item1 11.3 3.3 8.0 Item2 5.5 5.5 5.5

Not sure about the effectiveness.

+1

Cleb Dec 12 '17 at 16:13

source share

joris · Accepted Answer · 2015-04-06T20:24:23+0000

Providing axis=1 does not seem to work (since filling with the Series only works for column by column, not row by row).
A workaround is to "translate" the sum of each row into a data frame that has the same index / columns as the original. With a slightly modified example:

 In [57]: df = pd.DataFrame([[np.nan, 3.3, 8], [np.nan, np.nan, 5.5]], index=['Item1', 'Item2'], columns=['Estimate1', 'Estimate2', 'Estimate3']) In [58]: df Out[58]: Estimate1 Estimate2 Estimate3 Item1 NaN 3.3 8.0 Item2 NaN NaN 5.5 In [59]: fill_value = pd.DataFrame({col: df.sum(axis=1) for col in df.columns}) In [60]: fill_value Out[60]: Estimate1 Estimate2 Estimate3 Item1 11.3 11.3 11.3 Item2 5.5 5.5 5.5 In [61]: df.fillna(fill_value) Out[61]: Estimate1 Estimate2 Estimate3 Item1 11.3 3.3 8.0 Item2 5.5 5.5 5.5

There is an open extension problem for this: https://github.com/pydata/pandas/issues/4514

How to replace NaN with the sum of a string in Pandas DatatFrame

More articles: