Why scala.beans.beanproperty works differently in sparks

In scala REPL, the following code

import scala.beans.BeanProperty class EmailAccount { @scala.beans.BeanProperty var accountName: String = null override def toString: String = { return s"acct ($accountName)" } } classOf[EmailAccount].getDeclaredConstructor() 

leads to

 res0: java.lang.reflect.Constructor[EmailAccount] = public EmailAccount() 

however in the spark repl i get

 java.lang.NoSuchMethodException: EmailAccount.<init>() at java.lang.Class.getConstructor0(Class.java:2810) at java.lang.Class.getDeclaredConstructor(Class.java:2053) ... 48 elided 

What causes this mismatch? How can I get a spark according to the behavior of the spark shell.

I ran REPLs as follows:

 /home/placey/Downloads/spark-2.0.0-bin-hadoop2.7/bin/spark-shell --master local --jars /home/placey/snakeyaml-1.17.jar 

and

 scala -classpath "/home/placey/snakeyaml-1.17.jar 

Scala spark version:

 Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.7.0_55) 

scala:

 Welcome to Scala version 2.11.6 (Java HotSpot(TM) 64-Bit Server VM, Java 1.7.0_55). 
+1
scala javabeans apache-spark
source share
1 answer

Actually, this does not apply to scala.beans.BeanProperty or even to Spark. You can get the same behavior in the standard Scala REPL by running it with the -Yrepl-class-based option:

 scala -Yrepl-class-based 

Now try defining a simple empty class:

 scala> class Foo() defined class Foo scala> classOf[Foo].getConstructors res0: Array[java.lang.reflect.Constructor[_]] = Array(public Foo($iw)) scala> classOf[Foo].getFields res1: Array[java.lang.reflect.Field] = Array(public final $iw Foo.$outer) 

As you can see, REPL modified your class on the fly by adding an extra field and parameter to the constructor. Why?

Whenever you create val or var in the Scala REPL, it wraps itself in a special object, because in Scala there is no such thing as "global variables". See this answer .

This is usually an object, so it is available around the world. However, with -Yrepl-class-based REPL uses class instances instead of a single global object. This feature was introduced by Spark developers because Spark requires the classes to be serializable so that they can be sent to a remote employee (see this transfer request ).

Because of this, any class that you define in the REPL must receive an instance of $iw . Otherwise, you will not be able to access the global val and var that you defined in the REPL. In addition, the generated class automatically extends Serializable .

I'm afraid you can’t do anything to prevent this . spark-shell includes -Yrepl-class-based by default. Even if you could disable this behavior, you would encounter many other problems because your classes are no longer serializable, but Spark must serialize them.

+3
source share

All Articles