Is a thread chain a bad solution for this Java application?

I run a program where I upload large files, parse them, and then write the data that I extracted from the file to another file.

Files take a long time to download and analyze, but the recording task takes only about a minute or so. My solution that I chose was to have three fixed threads of three threads.

ExecutorService downloadExecutor = Executors.newFixedThreadPool(3); ExecutorService parseExecutor = Executors.newFixedThreadPool(3); ExecutorService writeExecutor = Executors.newFixedThreadPool(3); 

A thread in the load pool downloads this file, then sends a new thread to the parser thread pool with the file name as a parameter. This is done inside the thread itself. The download stream then downloads another file from the list of file URLs.

After the parser stream completes the analysis of the data I want to get from the file, it then sends a new stream containing the data to the threadpool write pool, where it is then written to the CSV file.

My question is is there a more elegant solution for this. I actually did not do a lot of complicated carving. Since I have many files for download and analysis, I do not want any of the threads to work at any time. The idea again is that since it could take some time to parse a file, I could also create separate threads dedicated to loading these files.

+5
source share
3 answers

Why not use only one thread pool. In any case, loading, parsing and saving should wait for each other, so the best separation of tasks would be to use one stream for each file.

+8
source

This is not a bad practice, as many developers do this kind of coding. But there is something you need to keep in mind.

Number one. You cannot expect an increase in performance just because you have more threads. There is an optimal number of threads based on the lack of processors.

Number two. You need to make sure how exceptions are handled.

Number three, you need to make sure that you can close all thread pools in case you need to stop the application.

+2
source

So your problem has two aspects:

  • Calculate limitation
  • IO bound

Reading and writing to a file are tied to IO. Async IO is the best for IO related tasks. Java has an AsynchronousFileChannel that allows you to read and write files without worrying about thread pools, where continuation is achieved through completion handlers. Full example.

 AsynchronousFileChannel ch = AsynchronousFileChannel.open(path); final ByteBuffer buf = ByteBuffer.allocate(1024); ch.read(buf, 0, 0, new CompletionHandler() { public void completed(Integer result, Integer length){ .. } public void failed(Throwable exc, Integer length) { .. } } ); 

And you do the same for recording, you just write to the channel

 ch.write(... 

There is no file parsing, that is, tasks related to computing, and you need your processor cores to be hot for this, you can assign a thread pool to the number of cores that you have.

  executorService = Executors.newFixedThreadPool(Runtime.getRuntime().availableProcessors()) 

Now remember that you need to test your code, and testing parallel code is difficult. If you can’t confirm your case, don’t do it.

+2
source

All Articles