Spark SQL DataFrame Java function not working

Question

Spark SQL DataFrame Java function not working

In Spark SQL, when I tried to use the map function in a DataFrame, I get below the error.

The method map (Function1, ClassTag) in the DataFrame type is not applicable for arguments (new Function () {})

I also follow documentation for spark 1.3. https://spark.apache.org/docs/latest/sql-programming-guide.html#inferring-the-schema-using-reflection Is there any solution?

Here is my test code.

// SQL can be run over RDDs that have been registered as tables. DataFrame teenagers = sqlContext.sql("SELECT name FROM people WHERE age >= 13 AND age <= 19"); List<String> teenagerNames = teenagers.map( new Function<Row, String>() { public String call(Row row) { return "Name: " + row.getString(0); } }).collect();

+5

java sql map-function apache-spark

user3206330 Apr 22 '15 at 7:19

source share

6 answers

econn · Answer 1 · 2015-05-05T17:44:22+0000

Change this to:

Java 6 and 7

 List<String> teenagerNames = teenagers.javaRDD().map( new Function<Row, String>() { public String call(Row row) { return "Name: " + row.getString(0); } }).collect();

Java 8

 List<String> t2 = teenagers.javaRDD().map( row -> "Name: " + row.getString(0) ).collect();

As soon as you call javaRDD (), it works just like any other RDD map function.

This works with Spark 1.3.0 and higher.

urug · Answer 2 · 2015-04-22T13:38:55+0000

You have the correct dependency installed in your pom. Install it and try

  <dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-sql_2.10</artifactId> <version>1.3.1</version> </dependency>

Yassine jouini · Answer 3 · 2015-06-04T10:01:12+0000

try the following:

 // SQL can be run over RDDs that have been registered as tables. DataFrame teenagers = sqlContext.sql("SELECT name FROM people WHERE age >= 13 AND age <= 19"); List<String> teenagerNames = teenagers.toJavaRDD().map( new Function<Row, String>() { public String call(Row row) { return "Name: " + row.getString(0); } }).collect();

you need to convert your DataFrame to javaRDD

Swaminathan s · Answer 4 · 2016-02-04T11:02:47+0000

check if you are using the correct import for

String (import org.apache.spark.sql.Row) Delete any other import data related to Row.other other ur syntax is correct

ankitbeohar90 · Answer 5 · 2016-04-12T06:45:34+0000

Please check the data of your input file and your sql query dataframe for the same thing that I came across, and when I look back at the data so that it does not match my query. So this is probably the same problem you are facing. toJavaRDD and JavaRDD work.

Vijay anantharamu · Answer 6 · 2017-12-23T03:55:38+0000

There is no need to convert to RDD, its delay is performed as shown below

`public static void mapMethod () {// Read the data from the file where the file is in the classpath. Dataset df = sparkSession.read (). Json ("file1.json");

 // Prior to java 1.8 Encoder<String> encoder = Encoders.STRING(); List<String> rowsList = df.map((new MapFunction<Row, String>() { private static final long serialVersionUID = 1L; @Override public String call(Row row) throws Exception { return "string:>" + row.getString(0).toString() + "<"; } }), encoder).collectAsList(); // from java 1.8 onwards List<String> rowsList1 = df.map((row -> "string >" + row.getString(0) + "<" ), encoder).collectAsList(); System.out.println(">>> " + rowsList); System.out.println(">>> " + rowsList1);

} `

Spark SQL DataFrame Java function not working

More articles: