Spark task is not serializable (Case classes)

Spark throws A task cannot be serialized if I use a case class or class / object that extends Serializable inside a closure.

object WriteToHbase extends Serializable {
    def main(args: Array[String]) {
        val csvRows: RDD[Array[String] = ...
        val dateFormatter = DateTimeFormat.forPattern("yyyy-MM-dd HH:mm:ss")
        val usersRDD = csvRows.map(row => {
            new UserTable(row(0), row(1), row(2), row(9), row(10), row(11))
        })
        processUsers(sc: SparkContext, usersRDD, dateFormatter)
    })
}

def processUsers(sc: SparkContext, usersRDD: RDD[UserTable], dateFormatter: DateTimeFormatter): Unit = {

    usersRDD.foreachPartition(part => {

        val conf = HBaseConfiguration.create()
        val table = new HTable(conf, tablename)

        part.foreach(userRow => {
            val id = userRow.id
            val date1 = dateFormatter.parseDateTime(userRow.date1)
        })
        table.flushCommits()
        table.close()
    })
}

My first attempt was to use the case class:

case class UserTable(id: String, name: String, address: String, ...) extends Serializable

My second attempt was to use the class instead of the case class:

class UserTable (val id: String, val name: String, val addtess: String, ...) extends Serializable {
}

My third attempt was to use a companion object in the class:

object UserTable extends Serializable {
    def apply(id: String, name: String, address: String, ...) = new UserTable(id, name, address, ...)
}
+4
source share
2 answers

It was a dateFormatter, I put it inside a section loop and now it works.

usersRDD.foreachPartition(part => {
    val id = userRow.id
    val dateFormatter = DateTimeFormat.forPattern("yyyy-MM-dd HH:mm:ss")
    val date1 = dateFormatter.parseDateTime(userRow.date1)
})
0
source

, "doSomething" , . doSomething - (, ).

+2

All Articles