Here are seven steps to install a spark in Windows 10 and run it from python:
Step 1: upload the gz zero 2.2.0 tar (tape Archive) file to any folder F from this link - https://spark.apache.org/downloads.html , Unzip it and copy the unpacked folder to the desired folder A. Rename the spark -2.2.0-bin-hadoop2.7 to folder.
Let the path to the spark folder be C: \ Users \ Desktop \ A \ spark
Step 2: upload the hardtop 2.7.3 tar gz file to the same F folder from this link - https://www.apache.org/dyn/closer.cgi/hadoop/common/hadoop-2.7.3/hadoop-2.7. 3.tar.gz. Unzip it and copy the unzipped folder to the same folder A. Rename the folder name from Hadoop-2.7.3.tar to hasoop. Let the path to the hadoop folder be C: \ Users \ Desktop \ A \ hadoop
Step 3: Create a new text file in Notepad. Save this blank notepad as winutils.exe (with saving as type: all files). Copy this O KB winutils.exe file to the bin folder in spark mode - C: \ Users \ Desktop \ A \ spark \ bin
Step 4: Now we need to add these folders to the system environment.
4a: create a system variable (not a user variable, since the user variable inherits all the properties of the system variable) Variable name: SPARK_HOME Variable value: C: \ Users \ Desktop \ A \ spark
Find the Path Path system variable and click Modify. You will see several ways. Do not delete any of the paths. Add this value to the variable -; C: \ Users \ Desktop \ A \ spark \ bin
4b: create a system variable
Variable Name: HADOOP_HOME Variable Value: C: \ Users \ Desktop \ A \ hadoop
Find the Path Path system variable and click Modify. Add this value to the variable -; C: \ Users \ Desktop \ A \ hadoop \ bin
4c: create a system variable Variable name: JAVA_HOME Search Java in windows. Right-click and click on the open file. You will have to right-click again on any of the java files and click on the open file location. You will use the path to this folder. OR you can search for C: \ Program Files \ Java. My version of Java installed on the system is jre1.8.0_131. Variable value: C: \ Program Files \ Java \ jre1.8.0_131 \ bin
Find the Path Path system variable and click Modify. Add this value to the variable -; C: \ Program Files \ Java \ jre1.8.0_131 \ bin
Step 5: open a command prompt and go to the folder with the spark hopper (type cd C: \ Users \ Desktop \ A \ spark \ bin). Type of spark sheath.
C:\Users\Desktop\A\spark\bin>spark-shell
This may take some time and give some warnings. Finally, it will show welcome to the spark version 2.2.0
Step 6: Enter exit () or restart the command line and again go to the intrinsic safety folder. Pyspark type:
C:\Users\Desktop\A\spark\bin>pyspark
It will show some warnings and errors, but ignores it. He works.
Step 7: Your download is complete. If you want to directly launch the spark from the python shell then: go to Scripts in the python folder and type
pip install findspark
on the command line.
In python shell
import findspark findspark.init()
import the necessary modules
from pyspark import SparkContext from pyspark import SparkConf
If you want to skip the steps to import findpark and initialize it, please follow the procedure given in import pyspark in the python shell