Spark return Pickle error: cannot find lookup attribute

I am launching some attribute search problems when trying to initiate a class in my RDD.

my workflow:

1- Start with RDD

2- Take each RDD element, initiate an object for each

3- Reduce (I will write a method that will later determine the reduction operation)

Here is number 2:

>class test(object): def __init__(self, a,b): self.total = a + b >a = sc.parallelize([(True,False),(False,False)]) >a.map(lambda (x,y): test(x,y)) 

Here is the error I get:

PicklingError: Unable to split <class main .test '>: search for the attribute main .test failed

I would like to know if there is anything like that. Please respond to a working example to achieve the intended results (i.e., creating RDD objects of class "tests").

Related questions:

+5
source share
1 answer

From Davis Liu (DataBricks):

"At present, PySpark cannot support sorting a class object in the current script (' main ), a class implementation can be implemented into a separate module in a workaround, then use" bin / spark-submit --py-files xxx.py "in the deployment.

in xxx.py:

 class test(object): def __init__(self, a, b): self.total = a + b 

in job.py:

 from xxx import test a = sc.parallelize([(True,False),(False,False)]) a.map(lambda (x,y): test(x,y)) 

run it:

 bin/spark-submit --py-files xxx.py job.py 

"

I just want to point out that you can pass the same argument (-py-files) to Spark Shell too.

+10
source

Source: https://habr.com/ru/post/1213555/


All Articles