We want to use the HDFS NIO.2 file system provider in the spark job. However, we ran into classpath issues with file system providers: they must be in the system class, which will be used through the Paths.get(URI) API Paths.get(URI) . As a result, the provider was not found, even if it was provided in jar files supplied in spark-submit.
Here's the spark-submit command:
spark-submit --master local["*"] \ --jars target/dependency/jimfs-1.1.jar,target/dependency/guava-16.0.1.jar \ --class com.basistech.tc.SparkFsTc \ target/spark-fs-tc-0.0.1-SNAPSHOT.jar
And here is the job class that does not work with βfile system not foundβ.
public final class SparkFsTc { private SparkFsTc() {
Is there any mechanism to convince the spark to add the FS provider to the appropriate class path?
Readers should be aware that file system providers are special. If you read the code in the JRE, you will see
ServiceLoader<FileSystemProvider> sl = ServiceLoader .load(FileSystemProvider.class, ClassLoader.getSystemClassLoader()).
They must be in the "system class loader". They are not found locally.
This thing will work fine if I myself acquired a FileSystem object reference instead of using Paths.get(URI) .
source share