How to build a ClassTag for Spark SQL DataFrame Mapping?

Spark SQL 1.2.0 queries return JavaRDD. Spark SQL 1.3.0 queries return a DataFrame. Converting a DataFrame to JavaRDD using DataFrame.toJavaRDD seems to take quite a while. I tried using DataFrame.map () and got a puzzling problem:

DataFrame df = sqlSC.sql(sql); RDD<String> rdd = df.map(new AbstractFunction1<Row, String> (){ @Override public String apply(Row t1) { return t1.getString(0); } }, ?); 

"?" should be scala.reflect.ClassTag. I used ClassManifestFactory.fromClass (String.class) and it did not work. What should I put on "?".

By the way, the example given in the http://spark.apache.org/docs/1.3.0/sql-programming-guide.html section. Interaction with the RDD section of the Java Code is not fixed: He used "map (new Function () {" . "Function" is not acceptable here. Must be "Function1".

+5
source share
3 answers

Try the following:

 RDD<String> rdd = df.map(new AbstractFunction1<Row, String> (){ @Override public String apply(Row t1) { return t1.getString(0); } }, scala.reflect.ClassManifestFactory.fromClass(String.class)); 
+5
source

try the following: (worked for me)

 RDD<String> rdd = df.toJavaRDD().map(new Function<Row, String> (){ @Override public String call(Row t1) { return t1.getString(0); } }); 
+1
source

Try the following:

 RDD<String> rdd = df.map(new AbstractFunction1<Row, String> (){ @Override public String apply(Row t1) { return t1.getString(0); } }, ClassManifestFactory$.MODULE$.fromClass(String.class)); 
0
source

Source: https://habr.com/ru/post/1215416/


All Articles