Spark SQL DataFrame Java function not working

In Spark SQL, when I tried to use the map function in a DataFrame, I get below the error.

The method map (Function1, ClassTag) in the DataFrame type is not applicable for arguments (new Function () {})

I also follow documentation for spark 1.3. https://spark.apache.org/docs/latest/sql-programming-guide.html#inferring-the-schema-using-reflection Is there any solution?

Here is my test code.

// SQL can be run over RDDs that have been registered as tables. DataFrame teenagers = sqlContext.sql("SELECT name FROM people WHERE age >= 13 AND age <= 19"); List<String> teenagerNames = teenagers.map( new Function<Row, String>() { public String call(Row row) { return "Name: " + row.getString(0); } }).collect(); 
+5
source share
6 answers

Change this to:

Java 6 and 7

 List<String> teenagerNames = teenagers.javaRDD().map( new Function<Row, String>() { public String call(Row row) { return "Name: " + row.getString(0); } }).collect(); 

Java 8

 List<String> t2 = teenagers.javaRDD().map( row -> "Name: " + row.getString(0) ).collect(); 

As soon as you call javaRDD (), it works just like any other RDD map function.

This works with Spark 1.3.0 and higher.

+10
source

You have the correct dependency installed in your pom. Install it and try

  <dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-sql_2.10</artifactId> <version>1.3.1</version> </dependency> 
0
source

try the following:

 // SQL can be run over RDDs that have been registered as tables. DataFrame teenagers = sqlContext.sql("SELECT name FROM people WHERE age >= 13 AND age <= 19"); List<String> teenagerNames = teenagers.toJavaRDD().map( new Function<Row, String>() { public String call(Row row) { return "Name: " + row.getString(0); } }).collect(); 

you need to convert your DataFrame to javaRDD

0
source

check if you are using the correct import for

String (import org.apache.spark.sql.Row) Delete any other import data related to Row.other other ur syntax is correct

0
source

Please check the data of your input file and your sql query dataframe for the same thing that I came across, and when I look back at the data so that it does not match my query. So this is probably the same problem you are facing. toJavaRDD and JavaRDD work.

0
source

There is no need to convert to RDD, its delay is performed as shown below

`public static void mapMethod () {// Read the data from the file where the file is in the classpath. Dataset df = sparkSession.read (). Json ("file1.json");

 // Prior to java 1.8 Encoder<String> encoder = Encoders.STRING(); List<String> rowsList = df.map((new MapFunction<Row, String>() { private static final long serialVersionUID = 1L; @Override public String call(Row row) throws Exception { return "string:>" + row.getString(0).toString() + "<"; } }), encoder).collectAsList(); // from java 1.8 onwards List<String> rowsList1 = df.map((row -> "string >" + row.getString(0) + "<" ), encoder).collectAsList(); System.out.println(">>> " + rowsList); System.out.println(">>> " + rowsList1); 

} `

0
source

All Articles