How to use the functions provided by the DataFrameNaFunctions class in Spark on a Dataframe?

Question

How to use the functions provided by the DataFrameNaFunctions class in Spark on a Dataframe?

I have a data frame, and I want to use one of the replace () functions org.apache.spark.sql.DataFrameNaFunctions on this frame.

Problem: I do not get these methods in intelligence (sentences) with a dataframe instance. I have imported this class explicitly.

I cannot find any material that can give me some demonstration of how to use these functions, or how to distinguish a dataframe from a DataFrameNaFunctions type.

I tried using it with the asInstanceof[] method, but it throws an exception.

+7

scala apache-spark

Parth vishvajit Apr 08 '16 at 12:47

source share

1 answer

eliasah · Accepted Answer · 2016-04-08T13:16:06+0000

This may be a bit confusing, but it's pretty frank, to be honest. Here is a small example:

 scala> val df = sqlContext.read.format("com.databricks.spark.csv").option("header","true").option("inferSchema","true").load("na_test.csv") // df: org.apache.spark.sql.DataFrame = [name: string, age: int] scala> df.show() // +-----+----+ // | name| age| // +-----+----+ // |alice| 35| // | bob|null| // | | 24| // +-----+----+ scala> df.na.fill(10.0,Seq("age")) // res4: org.apache.spark.sql.DataFrame = [name: string, age: int] // scala> df.na.fill(10.0,Seq("age")).show // +-----+---+ // | name|age| // +-----+---+ // |alice| 35| // | bob| 10| // | | 24| // +-----+---+ scala> df.na.replace("age", Map(35 -> 61,24 -> 12))).show() // +-----+----+ // | name| age| // +-----+----+ // |alice| 61| // | bob|null| // | | 12| // +-----+----+

To access org.apache.spark.sql.DataFrameNaFunctions you can call .na.

How to use the functions provided by the DataFrameNaFunctions class in Spark on a Dataframe?

More articles: