Pandas remove null values ​​when to_json

I have the actual pandas framework and I want to save it in json format. The pandas docs say:

Note. NaN, NaT and None will be converted to null and datetime objects will be converted based on date_format and date_unit Parameters

Then using the orient records option, I have something like this

 [{"A":1,"B":4,"C":7},{"A":null,"B":5,"C":null},{"A":3,"B":null,"C":null}] 

Is it possible:

 [{"A":1,"B":4,"C":7},{"B":5},{"A":3}]' 

thanks

+7
json python pandas
source share
3 answers

The following comes close to what you want, essentially we create a list of values ​​other than NaN, and then call to_json on this:

 In [136]: df.apply(lambda x: [x.dropna()], axis=1).to_json() Out[136]: '{"0":[{"a":1.0,"b":4.0,"c":7.0}],"1":[{"b":5.0}],"2":[{"a":3.0}]}' 

you need to create a list, otherwise it will try to align the result with your original df form, and this will return the NaN values ​​that you want to avoid:

 In [138]: df.apply(lambda x: pd.Series(x.dropna()), axis=1).to_json() Out[138]: '{"a":{"0":1.0,"1":null,"2":3.0},"b":{"0":4.0,"1":5.0,"2":null},"c":{"0":7.0,"1":null,"2":null}}' 

also calling list on the result of dropna will translate the result using the form, for example, filling:

 In [137]: df.apply(lambda x: list(x.dropna()), axis=1).to_json() Out[137]: '{"a":{"0":1.0,"1":5.0,"2":3.0},"b":{"0":4.0,"1":5.0,"2":3.0},"c":{"0":7.0,"1":5.0,"2":3.0}}' 
+2
source share

I have the same problem and my solution is to use json module instead of pd.DataFrame.to_json ()

My decision

  • dropping the value of NaN when converting a DataFrame to a dict, and then
  • convert dict to json using json.dumps ()

Here is the code:

 import pandas as pd import json from pandas import compat def to_dict_dropna(df): return {int(k): v.dropna().astype(int).to_dict() for k, v in compat.iteritems(df)} json.dumps(to_dict_dropna(df)) 
0
source share

The above solution does not actually produce results in a “record” format. This solution also uses the json package, but gives exactly the result asked in the original question.

 import pandas as pd import json json.dumps([row.dropna().to_dict() for index,row in df.iterrows()]) 

Alternatively, if you want to include the index (and you are on Python 3.5+), you can do:

 json.dumps([{'index':index, **row.dropna().to_dict()} for index,row in df.iterrows()]) 
0
source share

All Articles