Spark return Pickle error: cannot find lookup attribute

Question

Spark return Pickle error: cannot find lookup attribute

I am launching some attribute search problems when trying to initiate a class in my RDD.

my workflow:

1- Start with RDD

2- Take each RDD element, initiate an object for each

3- Reduce (I will write a method that will later determine the reduction operation)

Here is number 2:

>class test(object): def __init__(self, a,b): self.total = a + b >a = sc.parallelize([(True,False),(False,False)]) >a.map(lambda (x,y): test(x,y))

Here is the error I get:

PicklingError: Unable to split <class main .test '>: search for the attribute main .test failed

I would like to know if there is anything like that. Please respond to a working example to achieve the intended results (i.e., creating RDD objects of class "tests").

Related questions:

+5

python pickle apache-spark

Guillaume g Feb 17 '15 at 19:18

source share

1 answer

Guillaume g · Accepted Answer · 2015-02-20T09:37:55+0000

From Davis Liu (DataBricks):

"At present, PySpark cannot support sorting a class object in the current script (' main ), a class implementation can be implemented into a separate module in a workaround, then use" bin / spark-submit --py-files xxx.py "in the deployment.

in xxx.py:

 class test(object): def __init__(self, a, b): self.total = a + b

in job.py:

 from xxx import test a = sc.parallelize([(True,False),(False,False)]) a.map(lambda (x,y): test(x,y))

run it:

 bin/spark-submit --py-files xxx.py job.py

"

I just want to point out that you can pass the same argument (-py-files) to Spark Shell too.

Spark return Pickle error: cannot find lookup attribute

More articles: