This is my code:
class FNNode(val name: String)
case class Ingredient(override val name: String, category: String) extends FNNode(name)
val ingredients: RDD[(VertexId, FNNode)] =
sc.textFile(PATH+"ingr_info.tsv").
filter(! _.startsWith("#")).
map(line => line.split('\t')).
map(x => (x(0).toInt ,Ingredient(x(1), x(2))))
and there are no errors when defining these variables. However, when trying to execute it:
ingredients.take(1)
I get
org.apache.spark.SparkException: Job aborted due to stage failure: Exception while getting task result: java.io.InvalidClassException: $iwC$$iwC$Ingredient; no valid constructor
at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1431)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1419)
This seems to be due to serialization issues as per the answer here . However, I do not know how to solve this if it is really a serialization problem.
I follow the code in this book, so I would suggest that this should at least work at some point
source
share