Best practice to make loop faster is multithreaded option?

I call the Api service with an index in the url; for example, the last index: 420.555. I'm doing it:

for(int i =0; i <= 420555;i++){ url = new URl("https://someURL/"+ i); read the json with BufferedReader reader = new BufferedReader( new InputStreamReader( url.openStream(), "UTF-8" ) )) { create object from json save the result to my DB } 

performance is very poor.

(of course, there are many records to save in my database, but this requires more than 6 hours and a crash because the memory is full in the JAVA VM)

Do you have an idea how I can do this faster?

If you need the full code, I can publish it. but I think the for loop is the problem ...

My idea was to use Multithreading, but I have never worked before, and I'm not sure if this is the best practice for this case.

When multithreading is best practice, can you give an example for this case?

+7
java performance multithreading url api
source share
6 answers

Your code has:

  • select content from url as json
  • follow some steps to save the result in the database

He does it consistently.

So yes, of course, these parallel loop bodies should reduce the overall execution time. This will not help with memory issues. As the comments note, this problem is most likely caused by errors in your code (for example, not closing resources properly).

Of course, this creates new problems; for example, working with connection pools to access the database.

To add "multiple threads"; a direct approach is to send tasks to the ExecutorService - for example, here .

Finally: the first real answer is to retreat. It seems that the task already set is hard so that you understand the correctness! Adding more complexity can help with certain problems; but first you need to make sure your code is completely correct and works in "sequential mode" before adding more than one topic. Otherwise, you will encounter other problems that are fast, less deterministic, but harder to debug.

Second real answer: making 400K queries is never a good idea. Not in sequence, not in parallel. The real solution would be to step back and change this API and allow, for example, to read voluminous . Do not load 400K objects in 400K queries. Make 100 requests and upload 4K objects every time, for example.

In short: your real problem is the design of this API that you are using. If you do not change this, you will not solve your problem, but will fight the symptoms.

+5
source share

Yes, you do yours. In parallel, makes things faster. The following is an example of a multithreaded solution:

 //set THREADS_COUNT for controlling concurrency level int THREADS_COUNT=8; //make a shared repository for your URLs ConcurrentLinkedQueue<Integer> indexRepository=new ConcurrentLinkedQueue<Integer>(); for(int i=0;i< 420555;i++) indexRepository.add(i); // Define a ExecutorService which providing us multiple threads ExecutorService executor = Executors.newFixedThreadPool(THREADS_COUNT); //create multiple tasks (the count is the same as our threads) for (int i = 0; i < THREADS_COUNT; i++) executor.execute(new Runnable() { public void run() { while(!indexRepository.isEmpty()){ url = new URl("https://someURL/"+ indexRepository.remove()); //read the json with BufferedReader reader = new BufferedReader( new InputStreamReader( url.openStream(), "UTF-8" ) )) { //create object from json //save the result to my DB } } }); executor.shutdown(); // Wait until all threads are finish while (!executor.isTerminated()) { } System.out.println("\nFinished all threads"); 

Please note that working with the database can also significantly affect performance. Using batch insertion or using proper transactions can improve your performance.

+2
source share

Yes. You can do it faster using Executors

Use the API below if you are using java-8

 public static ExecutorService newWorkStealingPool() 

Creates a pool of processing threads, using all available processors as the target level of parallelism.

If you are not using java 8, use

 public static ExecutorService newFixedThreadPool(int nThreads) 

and set the number of threads as available processors.

 nThreads = Runtime.getRuntime().availableProcessors() 
+2
source share

Your question is a bit confusing, but looking at your code first of all closes the pages for each interaction:

 String url = "https://someURL/%d"; for(int i =0; i <= 420555;i++){ try (InputStreamReader fis = new InputStreamReader(new URL(String.format("https://someURL/",i)).openStream(), "UTF-8"); BufferedReader reader = new BufferedReader(fis)) { // do the job } } 
+1
source share

If you cannot get the data required from an external API, you can do this in parallel to improve performance.

You can divide the range into a smaller one (for example, [1-20], [21 - 40], ...), then create an ExecutorService with some pool size and process each fragment in parallel.

This will increase the performance of your program, but not by much. It also depends on the processor of your machine.

GhostCat's solution is correct, but I suggested an alternative. If you can’t get more data than 400K queries, this is just one way to improve the performance of getting data.

0
source share

another bottleneck that i see is db saving. If you save one by one, the performance will be poor, as it includes I / O operations. One of the best approaches will be different from reader and writer.

Reader: a piece of loading down with a piece for an example batch size of 500.

Writer: save in db with a batch size of 500.

If you make the separation, it will be easy to scale according to the requirement, you can increase the thread thread and the writer thread. One stream will read / write one piece of ie 500.

0
source share

All Articles