How to handle null entries in SparkR

I have a SparkSQL DataFrame.

Some entries in this data are empty, but they do not behave like NULL or NA. How can I delete them? Any ideas?

In R, I can easily remove them, but in sparkR he says that there is a problem with the S4 / system methods.

Thanks.

+6
source share
2 answers

SparkR Column provides a long list of useful methods , including isNull and isNotNull :

 > people_local <- data.frame(Id=1:4, Age=c(21, 18, 30, NA)) > people <- createDataFrame(sqlContext, people_local) > head(people) Id Age 1 1 21 2 2 18 3 3 NA > filter(people, isNotNull(people$Age)) %>% head() Id Age 1 1 21 2 2 18 3 3 30 > filter(people, isNull(people$Age)) %>% head() Id Age 1 4 NA 

Please keep in mind that in SparkR there is no difference between NA and NaN .

If you prefer operations on an entire data frame, there is a set of NA functions , including fillna and dropna :

 > fillna(people, 99) %>% head() Id Age 1 1 21 2 2 18 3 3 30 4 4 99 > dropna(people) %>% head() Id Age 1 1 21 2 2 18 3 3 30 

Both can be adjusted to only consider a subset of columns ( cols ), and dropna has some additional useful parameters. For example, you can specify the minimum number of non-zero columns:

 > people_with_names_local <- data.frame( Id=1:4, Age=c(21, 18, 30, NA), Name=c("Alice", NA, "Bob", NA)) > people_with_names <- createDataFrame(sqlContext, people_with_names_local) > people_with_names %>% head() Id Age Name 1 1 21 Alice 2 2 18 <NA> 3 3 30 Bob 4 4 NA <NA> > dropna(people_with_names, minNonNulls=2) %>% head() Id Age Name 1 1 21 Alice 2 2 18 <NA> 3 3 30 Bob 
+10
source

This is not a pleasant workaround, but if you discard them as strings, they are saved as "NaN" and then you can filter them, a brief example:

 testFrame <- createDataFrame(sqlContext, data.frame(a=c(1,2,3),b=c(1,NA,3))) testFrame$c <- cast(testFrame$b,"string") resultFrame <- collect(filter(testFrame, testFrame$c!="NaN")) resultFrame$c <- NULL 

This excludes the entire row where there is no element in column b.

+2
source

All Articles