Actually, this does not apply to scala.beans.BeanProperty or even to Spark. You can get the same behavior in the standard Scala REPL by running it with the -Yrepl-class-based option:
scala -Yrepl-class-based
Now try defining a simple empty class:
scala> class Foo() defined class Foo scala> classOf[Foo].getConstructors res0: Array[java.lang.reflect.Constructor[_]] = Array(public Foo($iw)) scala> classOf[Foo].getFields res1: Array[java.lang.reflect.Field] = Array(public final $iw Foo.$outer)
As you can see, REPL modified your class on the fly by adding an extra field and parameter to the constructor. Why?
Whenever you create val or var in the Scala REPL, it wraps itself in a special object, because in Scala there is no such thing as "global variables". See this answer .
This is usually an object, so it is available around the world. However, with -Yrepl-class-based REPL uses class instances instead of a single global object. This feature was introduced by Spark developers because Spark requires the classes to be serializable so that they can be sent to a remote employee (see this transfer request ).
Because of this, any class that you define in the REPL must receive an instance of $iw . Otherwise, you will not be able to access the global val and var that you defined in the REPL. In addition, the generated class automatically extends Serializable .
I'm afraid you canβt do anything to prevent this . spark-shell includes -Yrepl-class-based by default. Even if you could disable this behavior, you would encounter many other problems because your classes are no longer serializable, but Spark must serialize them.
PaweΕ bartkiewicz
source share