Read text file from HDFS line by line in mapper

Is the following code for Mappers correctly reading a text file from HDFS? And if it is:

  • What happens if two cartographers at different nodes try to open a file almost at the same time?
  • Do I need to close InputStreamReader ? If so, how to do it without closing the file system?

My code is:

 Path pt=new Path("hdfs://pathTofile"); FileSystem fs = FileSystem.get(context.getConfiguration()); BufferedReader br=new BufferedReader(new InputStreamReader(fs.open(pt))); String line; line=br.readLine(); while (line != null){ System.out.println(line); 
+8
java hadoop hdfs
source share
1 answer

This will work with some corrections - I assume that the code you inserted is simply truncated:

 Path pt=new Path("hdfs://pathTofile"); FileSystem fs = FileSystem.get(context.getConfiguration()); BufferedReader br=new BufferedReader(new InputStreamReader(fs.open(pt))); try { String line; line=br.readLine(); while (line != null){ System.out.println(line); // be sure to read the next line otherwise you'll get an infinite loop line = br.readLine(); } } finally { // you should close out the BufferedReader br.close(); } 

You can have several cartographers reading the same file, but there is a limitation where it makes sense to use a distributed cache (not only reducing the load on the data nodes that place blocks for the file, but also be more efficient if you have work with more tasks than you have task nodes)

+16
source share

All Articles