I am not sure how to handle NA as part of DataFrames.
For example, with the following DataFrame:
> import DataFrames > a = DataFrames.@data ([1, 2, 3, 4, 5]); > b = DataFrames.@data ([3, 4, 5, 6, NA]); > ndf = DataFrames.DataFrame(a=a, b=b)
I can successfully perform the following operation in a column :a
> ndf[ndf[:a] .== 4, :]
but if I try to perform the same operation on :b , I get the error NAException("cannot index an array with a DataArray containing NA values") .
> ndf[ndf[:b] .== 4, :] NAException("cannot index an array with a DataArray containing NA values") while loading In[108], in expression starting on line 1 in to_index at /Users/abisen/.julia/v0.3/DataArrays/src/indexing.jl:85 in getindex at /Users/abisen/.julia/v0.3/DataArrays/src/indexing.jl:210 in getindex at /Users/abisen/.julia/v0.3/DataFrames/src/dataframe/dataframe.jl:268
This is due to the presence of NA.
My question is how should DataFrames with NA be handled? I can understand that the operation > or < against NA will be undefined , but == should work (no?).
source share