I need to get text from a remote website that does not provide an RSS feed.
I know that the data I need is always found on pages linked to the home page ( http://www.example.com/ ) with a link containing the text " Invoices Report ".
For instance:
<a href="http://www.example.com/data/invoices/2010/10/invoices-report---tuesday-october-12.html">Invoices Report - Tuesday, October 12</a>
So, I need to find all the links on the main page that match this template, and then get all the text from the pages that are inside the <div class="invoice-body"> .
Are there Java tools that help with this, and is there something specifically for the Google App Engine for Java that can be used to do this?
source share