What is the difference between spark-submit and pyspark?

Question

What is the difference between spark-submit and pyspark?

If I run pyspark and then run this command:

import my_script; spark = my_script.Sparker(sc); spark.collapse('./data/')

All A-ok. If, however, I try to do the same through the command line and fix-submit, I get an error message:

 Command: /usr/local/spark/bin/spark-submit my_script.py collapse ./data/ File "/usr/local/spark/python/pyspark/rdd.py", line 352, in func return f(iterator) File "/usr/local/spark/python/pyspark/rdd.py", line 1576, in combineLocally merger.mergeValues(iterator) File "/usr/local/spark/python/pyspark/shuffle.py", line 245, in mergeValues for k, v in iterator: File "/.../my_script.py", line 173, in _json_args_to_arr js = cls._json(line) RuntimeError: uninitialized staticmethod object

my_script:

 ... if __name__ == "__main__": args = sys.argv[1:] if args[0] == 'collapse': directory = args[1] from pyspark import SparkContext sc = SparkContext(appName="Collapse") spark = Sparker(sc) spark.collapse(directory) sc.stop()

Why is this happening? What is the difference between starting pyspark and starting spark-submit, which might cause this discrepancy? And how can I do this work in spark-submit?

EDIT: I tried to run this from the bash shell by doing pyspark my_script.py collapse ./data/ and I got the same error. The only time that everything works is when I am in the python shell and import the script.

+7

python apache-spark pyspark

user592419 Nov 04 '14 at 2:30

source share

2 answers

avrsanjay · Answer 1 · 2016-09-21T14:29:41+0000

If you created a spark application, you need to use spark-submit to run the application
- The code can be written either in python / scala
- The mode can be local / clustered
If you just want to check / run several separate commands, you can use the shell provided by spark
- pyspark (for spark in python)
- spark shell (for spark in scala)

C19 · Answer 2 · 2014-12-25T03:33:21+0000

pyspark-submit submit your code to workers in the cluster to execute.

check: http://spark.apache.org/docs/latest/submitting-applications.html

What is the difference between spark-submit and pyspark?

More articles: