How pyspark variables work

I know that he uses pickled and ship things in knots and stores in his memory and so on. I am confused, so the syntax for using it in pyspark works.

def main():
    sc = SparkContext()
    someValue = rand()
    V = sc.broadcast(someValue)
    A = sc.parallelize().map(worker)

def worker(element):
    element *= V.value

Why does the above code not get a “V” undefined complaint? I searched the broadcast related source code in pyspark but didn't get any hint.

+4
source share
2 answers

I believe your problem is a problem with Python. If you try the following non-Spark Python code, similarly the error with "V" is undefined ":

def runner(func):
    func()

def main():
    V = 22
    A = runner(worker)

def worker():
    print V

if __name__ == '__main__':
    main()

One fix: you can move worker()inward main()(or, alternatively, make a Vglobal variable):

def main():
    sc = SparkContext()
    someValue = rand()
    V = sc.broadcast(someValue)
    def worker(element):
        element *= V.value
    A = sc.parallelize().map(worker)
+3

Spark Spark: Cluster Computing with Working Sets , , . , , . HotCloud 2010. 2010.

: , . b v, v . b - . bs node, Spark , v , . HDFS , .

+1

All Articles