I am writing a parser for a site, it has many pages (I call them IndexPages). Each page has many links (from 300 to 400 links in IndexPage). I use Java ExecutorServiceto call 12 Callablesat the same time in the same IndexPage. Each Callablesimply launches an HTTP request on one link and performs some actions on parsing and storing db. When the first IndexPage is completed, the program moves to the second IndexPage until the next IndexPage is found.
When working, it seems good, I can keep track of work / work flows well. Each parsing / storage of links takes from 1 to 2 seconds.
, Callable (/) . , , 10 , Callable ( , ). , .
:
ExecutorService executorService = Executors.newFixedThreadPool(12);
String indexUrl =
while(true)
{
String nextPage =
Set<Callable<Void>> callables = new HashSet<>();
for(String url : getUrls(indexUrl))
{
Callable callable = new ParserCallable(url , … and some DAOs);
callables.add(callable);
}
try {
executorService.invokeAll(callables);
} catch (InterruptedException e) {
e.printStackTrace();
}
if (nextPage == null)
break;
indexUrl = nextPage;
}
executorService.shutdown();
. , ? , ?
CPU/Memory/Heap .
, FYI.
================ >
ExecutorService ForkJoinPool:
ForkJoinPool pool=new ForkJoinPool(12);
String indexUrl =
while(true)
{
Set<Callable<Void>> callables = new HashSet<>();
for(String url : for(String url : getUrls(indexUrl)))
{
Callable callable = new ParserCallable(url , DAOs...);
callables.add(callable);
}
pool.invokeAll(callables);
String nextPage =
if (nextPage == null)
break;
indexUrl = nextPage;
}
, ExecutorService. ExecutorService 2 , , ForkJoinPool 3 , Callable ( 1 5,6 10 ). , , , ( ).
, ( ) GregorianCalendar, Date SimpleDateFormat . . .