Your code has:
- select content from url as json
- follow some steps to save the result in the database
He does it consistently.
So yes, of course, these parallel loop bodies should reduce the overall execution time. This will not help with memory issues. As the comments note, this problem is most likely caused by errors in your code (for example, not closing resources properly).
Of course, this creates new problems; for example, working with connection pools to access the database.
To add "multiple threads"; a direct approach is to send tasks to the ExecutorService - for example, here .
Finally: the first real answer is to retreat. It seems that the task already set is hard so that you understand the correctness! Adding more complexity can help with certain problems; but first you need to make sure your code is completely correct and works in "sequential mode" before adding more than one topic. Otherwise, you will encounter other problems that are fast, less deterministic, but harder to debug.
Second real answer: making 400K queries is never a good idea. Not in sequence, not in parallel. The real solution would be to step back and change this API and allow, for example, to read voluminous . Do not load 400K objects in 400K queries. Make 100 requests and upload 4K objects every time, for example.
In short: your real problem is the design of this API that you are using. If you do not change this, you will not solve your problem, but will fight the symptoms.
Ghostcat
source share