This problem was caused by the guava library,
The version that on AMI is 11, while the spark requires version 14.
I edited the bootstrap script from AWS to install spark 1.0.2 and update the guava library during the bootstrap action, which you can get here:
https://gist.github.com/tnbredillet/867111b8e1e600fa588e
Even after updating guava, I still had a problem. When I tried to save data to S3, I got an exception
lzo.GPLNativeCodeLoader - Could not load native gpl library java.lang.UnsatisfiedLinkError: no gplcompression in java.library.path
I decided that by adding the native hadoop library to java.library.path. When I run the task, I add a parameter
-Djava.library.path=/home/hadoop/lib/native
or if I run the task through spark-submit, I add
--driver-library-path /home/hadoop/lib/native
argument.
Eras
source share