I want to be able to capture content from web pages, especially tags and content within them. I tried XQuery and XPath, but they don't seem to work for distorted XHTML and REGEX - this is just a pain.
Is there a better solution. Ideally, I would like to be able to request all the links and return an array of URLs or request the text of the links and return an array of strings with the text of the links or request all the bold text, etc.
source share