Retrieving a web page, including embedded objects

I would like to get a web page, including images, flash animations and other embedded objects. What is an easy way to achieve this?

+2
source share
3 answers

Writing a web crawler in the Java programming language. http://java.sun.com/developer/technicalArticles/ThirdParty/WebCrawler/

+2
source

Use an open source HTML parser such as HTMLCleaner - http://java-source.net/open-source/html-parsers/htmlcleaner or CyberNekoHtml - http://java-source.net/open-source/html -parsers / nekohtml .

Once you have used the parser to create the DOM representation of the web page, you can load / load images and other embedded objects that exist in the DOM by running queries in the DOM and extract the appropriate src attributes from the HTML elements.

+1
source
+1
source

All Articles