How to upgrade Spark to a newer version?

I have a virtual machine with Spark 1.3 , but I want to upgrade it to Spark 1.5 primarily because of certain supported functions that were not in 1.3. Is it possible to upgrade the version of Spark from 1.3 to 1.5 , and if so, how can I do this?

+9
apache-spark
source share
2 answers

Pre-built Spark distributions, such as the one that I believe is used based on another question , are pretty easy to “upgrade” because Spark is not actually “installed”. In fact, all you have to do is:

  • Download the corresponding Spark distribution (pre-built for Hadoop 2.6 and later, in your case)
  • Unzip the tar file to the appropriate directory (i.e. the spark-1.3.1-bin-hadoop2.6 already exists)
  • Update SPARK_HOME (and maybe some other environment variables depending on your installation) accordingly

Here is what I did myself to go from 1.3.1 to 1.5.2, in a setup similar to yours (vagrant VM runs Ubuntu):

1) Upload the tar file to the appropriate directory

 vagrant@sparkvm2:~$ cd $SPARK_HOME vagrant@sparkvm2:/usr/local/bin/spark-1.3.1-bin-hadoop2.6$ cd .. vagrant@sparkvm2:/usr/local/bin$ ls ipcluster ipcontroller2 iptest ipython2 spark-1.3.1-bin-hadoop2.6 ipcluster2 ipengine iptest2 jsonschema ipcontroller ipengine2 ipython pygmentize vagrant@sparkvm2:/usr/local/bin$ sudo wget http://apache.tsl.gr/spark/spark-1.5.2/spark-1.5.2-bin-hadoop2.6.tgz [...] vagrant@sparkvm2:/usr/local/bin$ ls ipcluster ipcontroller2 iptest ipython2 spark-1.3.1-bin-hadoop2.6 ipcluster2 ipengine iptest2 jsonschema spark-1.5.2-bin-hadoop2.6.tgz ipcontroller ipengine2 ipython pygmentize 

Please note that the exact mirror you should use with wget will probably differ from mine, depending on your location; You will get this by clicking the “Download Spark” link on the Download page after you have selected the type of package to download.

2) Unzip the tgz file with

 vagrant@sparkvm2:/usr/local/bin$ sudo tar -xzf spark-1.*.tgz vagrant@sparkvm2:/usr/local/bin$ ls ipcluster ipcontroller2 iptest ipython2 spark-1.3.1-bin-hadoop2.6 ipcluster2 ipengine iptest2 jsonschema spark-1.5.2-bin-hadoop2.6 ipcontroller ipengine2 ipython pygmentize spark-1.5.2-bin-hadoop2.6.tgz 

You can see that now you have a new folder, spark-1.5.2-bin-hadoop2.6 .

3) Update SPARK_HOME (and possibly other environment variables that you use) to point to this new directory, not the previous one.

And you must do this after rebooting your machine.

Note:

  • You do not need to remove the previous Spark distribution if all the relevant environment variables point to a new one. That way, you can even quickly move back and forth between the old and new versions if you want to test things (i.e. you just need to change the corresponding environment variables).
  • sudo was necessary in my case; this may not be necessary for you depending on your settings.
  • After everything works fine, it is recommended to delete the downloaded tgz file.
  • You can use the same procedure to upgrade to future versions of Spark as they come out (pretty fast). If you do this, either make sure that the previous tgz files have been deleted, or modify the tar command above to point to a specific file (i.e. There are no * wildcards as described above).
+16
source share
  1. Set SPARK_HOME to /opt/spark
  2. Download the latest finished binary, i.e. spark-2.2.1-bin-hadoop2.7.tgz - you can use wget
  3. Create a symlink to the latest download - ln -s /opt/spark-2.2.1 /opt/spark
  4. Edit the files in $SPARK_HOME/conf accordingly

For each downloadable new version, simply create a symbolic link to it (step 3)

  • ln -s /opt/spark-xxx /opt/spark
+2
source share

All Articles