Spark - scala: not a member of org.apache.spark.sql.Row

I am trying to convert a data frame to RDD and then perform some operations below to return tuples:

df.rdd.map { t=>
 (t._2 + "_" + t._3 , t)
}.take(5)

Then I got the error below. Does anyone have any ideas? Thank!

<console>:37: error: value _2 is not a member of org.apache.spark.sql.Row
               (t._2 + "_" + t._3 , t)
                  ^
+4
source share
2 answers

When you convert a DataFrame to RDD, you get RDD[Row], so when you use map, your function receives the Rowas parameter . Therefore, you should use methods Rowto access your members (note that the index starts at 0):

df.rdd.map { 
  row: Row => (row.getString(1) + "_" + row.getString(2), row)
}.take(5)

Row Spark scaladoc.

: , , String DataFrame :

import org.apache.spark.sql.functions._
val newDF = df.withColumn("concat", concat(df("col2"), lit("_"), df("col3")))
+7

Row, List Array, (index), get.

:

df.rdd.map {t =>
  (t(2).toString + "_" + t(3).toString, t)
}.take(5)
+5

All Articles