If I run pyspark and then run this command:
import my_script; spark = my_script.Sparker(sc); spark.collapse('./data/')
All A-ok. If, however, I try to do the same through the command line and fix-submit, I get an error message:
Command: /usr/local/spark/bin/spark-submit my_script.py collapse ./data/ File "/usr/local/spark/python/pyspark/rdd.py", line 352, in func return f(iterator) File "/usr/local/spark/python/pyspark/rdd.py", line 1576, in combineLocally merger.mergeValues(iterator) File "/usr/local/spark/python/pyspark/shuffle.py", line 245, in mergeValues for k, v in iterator: File "/.../my_script.py", line 173, in _json_args_to_arr js = cls._json(line) RuntimeError: uninitialized staticmethod object
my_script:
... if __name__ == "__main__": args = sys.argv[1:] if args[0] == 'collapse': directory = args[1] from pyspark import SparkContext sc = SparkContext(appName="Collapse") spark = Sparker(sc) spark.collapse(directory) sc.stop()
Why is this happening? What is the difference between starting pyspark and starting spark-submit, which might cause this discrepancy? And how can I do this work in spark-submit?
EDIT: I tried to run this from the bash shell by doing pyspark my_script.py collapse ./data/ and I got the same error. The only time that everything works is when I am in the python shell and import the script.
python apache-spark pyspark
user592419
source share