If this is a programming issue, I would suggest you write your own regular expression to parse all the resulting content. Target tags are IMG and A for standard HTML. For JAVA,
final String openingTags = "(<a [^>]*href=['\"]?|<img[^> ]* src=['\"]?)";
this, together with the Pattern and Matcher classes, should detect the start of tags. Add a LINK tag if you also want to use CSS.
However, it is not as easy as you might think. Many web pages are not well formed. Retrieving all the links programmatically that a person can βrecognizeβ is really difficult if you need to consider all irregular expressions.
Good luck
mizubasho Sep 17 '09 at 15:17 2009-09-17 15:17
source share