What is the difference between spark-submit and pyspark?

If I run pyspark and then run this command:

import my_script; spark = my_script.Sparker(sc); spark.collapse('./data/') 

All A-ok. If, however, I try to do the same through the command line and fix-submit, I get an error message:

 Command: /usr/local/spark/bin/spark-submit my_script.py collapse ./data/ File "/usr/local/spark/python/pyspark/rdd.py", line 352, in func return f(iterator) File "/usr/local/spark/python/pyspark/rdd.py", line 1576, in combineLocally merger.mergeValues(iterator) File "/usr/local/spark/python/pyspark/shuffle.py", line 245, in mergeValues for k, v in iterator: File "/.../my_script.py", line 173, in _json_args_to_arr js = cls._json(line) RuntimeError: uninitialized staticmethod object 

my_script:

 ... if __name__ == "__main__": args = sys.argv[1:] if args[0] == 'collapse': directory = args[1] from pyspark import SparkContext sc = SparkContext(appName="Collapse") spark = Sparker(sc) spark.collapse(directory) sc.stop() 

Why is this happening? What is the difference between starting pyspark and starting spark-submit, which might cause this discrepancy? And how can I do this work in spark-submit?

EDIT: I tried to run this from the bash shell by doing pyspark my_script.py collapse ./data/ and I got the same error. The only time that everything works is when I am in the python shell and import the script.

+7
python apache-spark pyspark
source share
2 answers
  • If you created a spark application, you need to use spark-submit to run the application

    • The code can be written either in python / scala

    • The mode can be local / clustered

  • If you just want to check / run several separate commands, you can use the shell provided by spark

    • pyspark (for spark in python)
    • spark shell (for spark in scala)
+7
source share

pyspark-submit submit your code to workers in the cluster to execute.

check: http://spark.apache.org/docs/latest/submitting-applications.html

+2
source share

All Articles