Pre-built Spark distributions, such as the one that I believe is used based on another question , are pretty easy to “upgrade” because Spark is not actually “installed”. In fact, all you have to do is:
- Download the corresponding Spark distribution (pre-built for Hadoop 2.6 and later, in your case)
- Unzip the tar file to the appropriate directory (i.e. the
spark-1.3.1-bin-hadoop2.6 already exists) - Update
SPARK_HOME (and maybe some other environment variables depending on your installation) accordingly
Here is what I did myself to go from 1.3.1 to 1.5.2, in a setup similar to yours (vagrant VM runs Ubuntu):
1) Upload the tar file to the appropriate directory
vagrant@sparkvm2:~$ cd $SPARK_HOME vagrant@sparkvm2:/usr/local/bin/spark-1.3.1-bin-hadoop2.6$ cd .. vagrant@sparkvm2:/usr/local/bin$ ls ipcluster ipcontroller2 iptest ipython2 spark-1.3.1-bin-hadoop2.6 ipcluster2 ipengine iptest2 jsonschema ipcontroller ipengine2 ipython pygmentize vagrant@sparkvm2:/usr/local/bin$ sudo wget http://apache.tsl.gr/spark/spark-1.5.2/spark-1.5.2-bin-hadoop2.6.tgz [...] vagrant@sparkvm2:/usr/local/bin$ ls ipcluster ipcontroller2 iptest ipython2 spark-1.3.1-bin-hadoop2.6 ipcluster2 ipengine iptest2 jsonschema spark-1.5.2-bin-hadoop2.6.tgz ipcontroller ipengine2 ipython pygmentize
Please note that the exact mirror you should use with wget will probably differ from mine, depending on your location; You will get this by clicking the “Download Spark” link on the Download page after you have selected the type of package to download.
2) Unzip the tgz file with
vagrant@sparkvm2:/usr/local/bin$ sudo tar -xzf spark-1.*.tgz vagrant@sparkvm2:/usr/local/bin$ ls ipcluster ipcontroller2 iptest ipython2 spark-1.3.1-bin-hadoop2.6 ipcluster2 ipengine iptest2 jsonschema spark-1.5.2-bin-hadoop2.6 ipcontroller ipengine2 ipython pygmentize spark-1.5.2-bin-hadoop2.6.tgz
You can see that now you have a new folder, spark-1.5.2-bin-hadoop2.6 .
3) Update SPARK_HOME (and possibly other environment variables that you use) to point to this new directory, not the previous one.
And you must do this after rebooting your machine.
Note:
- You do not need to remove the previous Spark distribution if all the relevant environment variables point to a new one. That way, you can even quickly move back and forth between the old and new versions if you want to test things (i.e. you just need to change the corresponding environment variables).
sudo was necessary in my case; this may not be necessary for you depending on your settings.- After everything works fine, it is recommended to delete the downloaded
tgz file. - You can use the same procedure to upgrade to future versions of Spark as they come out (pretty fast). If you do this, either make sure that the previous
tgz files have been deleted, or modify the tar command above to point to a specific file (i.e. There are no * wildcards as described above).
desertnaut
source share