Pandas Density issue with fillna

I am trying to create a sparse data framework in pandas. I create the source data file through

df =  pd.read_sql(sql=sql, con=db_eng, index_col=index)
idx = pd.MultiIndex.from_product([df.index.levels[0], df.index.levels[1]], names=df.index.names)
my_df = df.reindex(idx)

and then make it sparse using the following methods

s1 = my_df.to_sparse()
s2 = my_df.to_sparse(fill_value=0)
s2 = my_df.to_sparse().fillna(value=0)

When I check the density s1, s2, s3, I get different values:

>>> s1.density
0.054158277796754875
>>> s2.density
1.0
>>> s3.density
0.054158277796754875

Why does the second method give me a density of 1?

+4
source share
1 answer

I do not have access to your data, but it looks like your "empty" values NaN, therefore, making it sparse based on 0 values ​​(i.e. s2), then a sparse data frame is not rare at all.

This will return what you expect:

s2 = my_df.fillna(0).to_sparse(fill_value=0)

"" 0, to_sparse fill_value=0 .


. , to_sparse fill_value, NaN.

to_sparse fill_value=0 , NaN, ( = 1.0), NaN.

: http://pandas.pydata.org/pandas-docs/stable/sparse.html.

+1

All Articles