What I do:- scan the page- get all the links of the page, put them in the list- launch a new crawler that visits all the links in the list- download them
There must be a faster way when I can directly download links when visiting the page? thank!
crawler4j . . , . crawler4j shouldVisit. , true . , URL- true false.
URL-, true, , .
.
, , ( , ).
crawler4j . , , , , , , . , 1000 , 0,3 . , - 300 , . , .
- , , , , . , AWS ( ), , - , ( ISP, ).
, , , , , ( ) .