General error in the configuration of your work:
Configuration conf2 = new Configuration(); job = new Job(conf2); job.setJobName("Join with Cache"); DistributedCache.addCacheFile(new URI("hdfs://server:port/FilePath/part-r-00000"), conf2);
After you create your Job object, you need to drop the Configuration object when Job makes a copy of it, and setting values ββin conf2 after creating the job will not affect the job. Try the following:
job = new Job(new Configuration()); Configuration conf2 = job.getConfiguration(); job.setJobName("Join with Cache"); DistributedCache.addCacheFile(new URI("hdfs://server:port/FilePath/part-r-00000"), conf2);
You should also check the number of files in the distributed cache, possibly more than one, and you open a random file that gives you the value that you see.
I suggest you use a symlink that will make the files available in the local working directory and with a known name:
DistributedCache.createSymlink(conf2); DistributedCache.addCacheFile(new URI("hdfs://server:port/FilePath/part-r-00000#myfile"), conf2); // then in your mapper setup: BufferedReader joinReader = new BufferedReader(new FileInputStream("myfile"));
Chris white
source share