This may be a bit confusing, but it's pretty frank, to be honest. Here is a small example:
scala> val df = sqlContext.read.format("com.databricks.spark.csv").option("header","true").option("inferSchema","true").load("na_test.csv") // df: org.apache.spark.sql.DataFrame = [name: string, age: int] scala> df.show() // +-----+----+ // | name| age| // +-----+----+ // |alice| 35| // | bob|null| // | | 24| // +-----+----+ scala> df.na.fill(10.0,Seq("age")) // res4: org.apache.spark.sql.DataFrame = [name: string, age: int] // scala> df.na.fill(10.0,Seq("age")).show // +-----+---+ // | name|age| // +-----+---+ // |alice| 35| // | bob| 10| // | | 24| // +-----+---+ scala> df.na.replace("age", Map(35 -> 61,24 -> 12))).show() // +-----+----+ // | name| age| // +-----+----+ // |alice| 61| // | bob|null| // | | 12| // +-----+----+
To access org.apache.spark.sql.DataFrameNaFunctions you can call .na.
eliasah
source share