Spark - scala: not a member of org.apache.spark.sql.Row

Question

Spark - scala: not a member of org.apache.spark.sql.Row

I am trying to convert a data frame to RDD and then perform some operations below to return tuples:

df.rdd.map { t=>
 (t._2 + "_" + t._3 , t)
}.take(5)

Then I got the error below. Does anyone have any ideas? Thank!

<console>:37: error: value _2 is not a member of org.apache.spark.sql.Row
               (t._2 + "_" + t._3 , t)
                  ^

+4

scala apache-spark apache-spark-sql rdd spark-dataframe

Edamame May 19 '16 at 23:02

source share

2 answers

Row, List Array, (index), get.

:

df.rdd.map {t =>
  (t(2).toString + "_" + t(3).toString, t)
}.take(5)

+5

Alberto Bonsanto 19 '16 23:13

Daniel de Paula · Accepted Answer · 2016-05-19T23:11:43+0000

When you convert a DataFrame to RDD, you get RDD[Row], so when you use map, your function receives the Rowas parameter . Therefore, you should use methods Rowto access your members (note that the index starts at 0):

df.rdd.map { 
  row: Row => (row.getString(1) + "_" + row.getString(2), row)
}.take(5)

Row Spark scaladoc.

: , , String DataFrame :

import org.apache.spark.sql.functions._
val newDF = df.withColumn("concat", concat(df("col2"), lit("_"), df("col3")))

Spark - scala: not a member of org.apache.spark.sql.Row

More articles: