I am writing a spark job in scala to run with spark 1.3.0. My RDD conversion functions use classes from a third-party library that are not Serializable. To make closing serialization possible, I wrap these objects in com.twitter.chill.MeatLocker, which itself is java.io.Serializable but uses Kryo for wrapped objects. Then I make uber jar using build.
When I start my work, executing tasks do not throw a ClassNotFoundException for the same classes that I wrapped in MeatLocker. I know that there is a related error in the 1.2.x spark, but googling tells me on this topic that it was fixed in 1.3.0. https://issues.apache.org/jira/browse/SPARK-6069
I tried the spark configuration property spark.executor.userClassPathFirst = true, but with no effect. I passed this property to fix-submit script as follows:
spark-submit --class <My-Class> <My-Jar> --conf spark.executor.userClassPathFirst=true
Other than that, I was stuck without an option, but to get the source of a third-party library and include it in my project after these classes implement java.io.Serializable. Thus, I completely eliminate the need for serialization of Kryo, thereby not encountering this problem, but I hope for a better way.
source
share