I have a DF spark with strings Seq[(String, String, String)] . I'm trying to do something like flatMap with this, but everything I do ends up with a throw
java.lang.ClassCastException: org.apache.spark.sql.catalyst.expressions.GenericRowWithSchema cannot be passed to scala.Tuple3
I can take one line or several lines from DF just fine
df.map{ r => r.getSeq[Feature](1)}.first
returns
Seq[(String, String, String)] = WrappedArray([ancient,jj,o], [olympia_greece,nn,location] .....
and the RDD data type seems correct.
org.apache.spark.rdd.RDD[Seq[(String, String, String)]]
Df circuit
root |-- article_id: long (nullable = true) |-- content_processed: array (nullable = true) | |-- element: struct (containsNull = true) | | |-- lemma: string (nullable = true) | | |-- pos_tag: string (nullable = true) | | |-- ne_tag: string (nullable = true)
I know this problem is due to spark sql treating RDD strings like org.apache.spark.sql.Row , although they idioticly say it's a Seq[(String, String, String)] . There is a related question (link below), but the answer to this question does not work for me. I am also not familiar with sparks to figure out how to turn it into a working solution.
Are the strings Row[Seq[(String, String, String)]] or Row[(String, String, String)] or Seq[Row[(String, String, String)]] or something even crazier.
I'm trying to do something like
df.map{ r => r.getSeq[Feature](1)}.map(_(1)._1)
which seems to work but doesn't actually work
df.map{ r => r.getSeq[Feature](1)}.map(_(1)._1).first
throws the above error. So, how should I (for example) get the first element of the second tuple in each row?
In addition, WHY has a spark intended for this, it seems that the idiotic claims that something has one type, when in fact it is not and cannot be converted to the declared type.
Related question: GenericRowWithSchema exception when passing ArrayBuffer to HashSet in DataFrame to RDD from Hive table
Related error report: http://search-hadoop.com/m/q3RTt2bvwy19Dxuq1&subj=ClassCastException+when+extracting+and+collecting+DF+array+column+type