Map Reduce is easily implemented using some of the good Java 6 concurrency features, especially Future, Callable, and ExecutorService.
I created a Callable that will parse the file the way you specified
public class FileAnalyser implements Callable<String> { private Scanner scanner; private List<String> termList; public FileAnalyser(String filename, List<String> termList) throws FileNotFoundException { this.termList = termList; scanner = new Scanner(new File(filename)); } @Override public String call() throws Exception { StringBuilder buffer = new StringBuilder(); while (scanner.hasNextLine()) { String line = scanner.nextLine(); String[] tokens = line.split(" "); if ((tokens.length >= 3) && (inTermList(tokens[2]))) buffer.append(line); } return buffer.toString(); } private boolean inTermList(String term) { return termList.contains(term); } }
We need to create a new callable for each file found and send it to the executor’s service. The result of the presentation is the future, which we can use later to get the result of the analysis of the file.
public class Analayser { private static final int THREAD_COUNT = 10; public static void main(String[] args) {
My example here is far from complete and far from effective. I did not consider the sample size, if it is really huge, you can continue the cycle over the future list by deleting the completed elements, something similar to:
while (futureList.size() > 0) { for (Future<String> current : futureList) { if (current.isDone()) { String result = current.get(); //Do something with result futureList.remove(current); break; //We have modified the list during iteration, best break out of for-loop } } }
Alternatively, you can implement a producer-consumer setting, where the producer transfers the calling challenges to the executing service and creates the future, and the consumer takes the result of the future and discards it in the future.
This may require that the product and the consumer be the threads themselves, as well as a synchronized list for adding / removing futures.
Any questions, please ask.
Karl Walsh
source share