I need to parse / read many HTML web pages (100+) for specific content (multiple lines of text that are almost the same).
I used scanner objects with reg. expressions and jsoup with its html parser.
Both methods are slow, and with jsoup I get the following error: java.net.SocketTimeoutException: read timeout (multiple computers with different connections)
Is there anything better?
EDIT:
Now that I got jsoup to work, I think the best question is how do I speed it up?
source share