"Invalid FS ... expected: file: ///" when trying to read a file from HDFS in Java

I cannot read a file with HDFS using Java:

String hdfsUrl = "hdfs://<ip>:<port>"; Configuration configuration = new Configuration(); configuration.set("fs.defaultFS", hdfsUrl); FileSystem fs = FileSystem.get(configuration); Path filePath = new Path(hdfsUrl + "/projects/harmonizome/data/achilles/attribute_list_entries.txt.gz"); FSDataInputStream fsDataInputStream = fs.open(filePath); SEVERE: Servlet.service() for servlet [edu.mssm.pharm.maayanlab.Harmonizome.api.DownloadAPI] in context with path [/Harmonizome] threw exception java.lang.IllegalArgumentException: Wrong FS: hdfs://146.203.54.165:8020/projects/harmonizome/data/achilles/attribute_list_entries.txt.gz, expected: file:/// at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:310) at org.apache.hadoop.fs.RawLocalFileSystem.pathToFile(RawLocalFileSystem.java:47) at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:357) at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:245) at org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.<init>(ChecksumFileSystem.java:125) at org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:283) at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:356) at edu.mssm.pharm.maayanlab.Harmonizome.api.DownloadAPI.readLines(DownloadAPI.java:37) at edu.mssm.pharm.maayanlab.Harmonizome.api.DownloadAPI.doGet(DownloadAPI.java:27) at javax.servlet.http.HttpServlet.service(HttpServlet.java:622) ... 

I did not configure HDFS, so I don’t know what I don’t know. Any help is appreciated.

+6
source share
2 answers

Try the following:

 Configuration configuration = new Configuration(); FileSystem fs = FileSystem.get(new URI(<url:port>), configuration); Path filePath = new Path(<path/to/file>); FSDataInputStream fsDataInputStream = fs.open(filePath); BufferedReader br = new BufferedReader(new InputStreamReader(fsDataInputStream)); 

Refer to http://techidiocy.com/java-lang-illegalargumentexception-wrong-fs-expected-file/

A similar problem is being considered.

+13
source

This is what I did to solve this problem when starting spark work on EMR:

  val hdfs = FileSystem.get(new java.net.URI(s"s3a://${s3_bucket}"), sparkSession.sparkContext.hadoopConfiguration) 

Be sure to replace s3_bucket with the name of your segment

I hope it will be useful for someone

0
source

All Articles