Failed to start pyspark

I installed Spark on Windows and I can not start pyspark . When I type c:\Spark\bin\pyspark , I get the following error:

Python 3.6.0 | Anaconda custom (64-bit) | (default, December 23, 2016, 11:57:41 am) [MSC v.1900 64 bit (AMD64)] on win32 Enter "help", "copyright", "credits" or "license" for more information. Traceback (last last call): File "c: \ Spark \ bin .. \ python \ pyspark \ shell.py", line 30, in the pyspark import folder File "c: \ Spark \ python \ pyspark__init __. Py", line 44, from from pyspark.context import the SparkContext file "c: \ Spark \ python \ pyspark \ context.py", line 36, from pyspark.java_gateway import launch_gateway File "c: \ Spark \ python \ pyspark \ java_gateway.py", line 31, in from py4j.java_gateway import java_import, JavaGateway, file GatewayClient ", line 961, in the file _find_and_load" line "950, in the file _find_and_load_unlocked", line 646, in _load_unlocked File "", line 616, in _load_backward_compable c: \ Spark \ python \ lib \ py4j-0.10.4-src.zip \ py4j \ java_gateway.py ", line 18, in the file" C: \ Users \ Eigenaar \ Anaconda3 \ lib \ pydoc.py ", line 62 , in import pkgutil File "C: \ Users \ Eigenaar \ Anaconda3 \ lib \ pkgutil .py ", line 22, in ModuleInfo = namedtuple ('ModuleInfo', 'module_finder name ispkg') File" c: \ Spark \ python \ pyspark \ serializers.py ", line 393, in namedtuple cls = _old_namedtuple (* args, ** kwargs) TypeError: namedtuple () missing 3 required arguments for the keyword only: 'verbose', 'rename' and 'module'

what am i doing wrong here?

+8
pyspark
source share
5 answers

Spark 2.1.0 does not support python 3.6.0. To solve this problem, change the python version in anaconda environment. Run the following command in your anaconda env

 conda create -n py35 python=3.5 anaconda activate py35 
+14
source share

Spark & ​​lt; = 2.1.0 is not compatible with Python 3.6. See this question , which also claims that this will be fixed with the upcoming release of Spark.

+7
source share

I solved this problem using one change in the pythons script.

I have a space below the code snippet in a python script called serializers.py, location c:\your-installation-dir\spark-2.0.2-bin-hadoop-2.7\python\pyspark\ and below the line to replace with line number 381.

 cls = _old_namedtuple(*args, **kwargs, verbose=False, rename=False, module=None) 

And then run pyspark on your command line, this will work.

+6
source share

I wanted to continue Indrajeet's answer, as he mentioned line numbers instead of the exact location of the code. See This is in addition to his answer for further clarification.

cls = _old_namedtuple (* args, ** kwargs)
this is the line that was changed in the answer

 def _hijack_namedtuple(): """ Hack namedtuple() to make it picklable """ # hijack only one time if hasattr(collections.namedtuple, "__hijack"): return global _old_namedtuple # or it will put in closure def _copy_func(f): return types.FunctionType(f.__code__, f.__globals__, f.__name__, f.__defaults__, f.__closure__) _old_namedtuple = _copy_func(collections.namedtuple) def namedtuple(*args, **kwargs): # cls = _old_namedtuple(*args, **kwargs) cls = _old_namedtuple(*args, **kwargs, verbose=False, rename=False, module=None) return _hack_namedtuple(cls) 

!!! EDIT March 6, 2017! This does fix the original problem, but I don’t think it will make spark 2.1 compatible with 3.6, but even more collisions. As a result, I used conda to create a python 35 virtual environment and worked like a charm.

(Windows if you have env variables)

 >conda create -n py35 python=3.5 >activate py35 >pyspark 
+2
source share

Possible problems when launching Spark on Windows are not providing the correct path or using Python 3.x to start Spark.

So,

  • Check path for spark ie / usr / local / spark Correct or not.
  • Install Python Path on Python 2.x (remove Python 3.x).
0
source share

All Articles