Crawl Websites Using Javascripts or Web Forms

I have a webcrawler application. He successfully browsed the most common and simple sites. Now I came across some types of sites where HTML documents are dynamically generated via FORMS or javascripts. I believe that they can be crawled, and I just do not know how to do it. Now these websites do not display the actual HTML page. I mean, if I view this page in IE or firefox, the HTML code is not true on IE or firefox. These sites contain text fields, checkboxes, etc. Therefore, I believe that they are called "web forms". Actually, I am not very familiar with web development, so correct me if I am wrong.

My question is: has anyone in a similar situation like me now successfully solved these types of "problems"? Does anyone know a book or article on web crawling? Those related to these advanced types of sites?

Thank.

+5
source share
3 answers

I found an article that deals with the deep web and its very interesting, and I think this answers my questions above.

http://www.trycatchfail.com/2008/11/10/creating-a-deep-web-crawler-with-net-background/

One must love this.

+1
source

There are two separate issues here.

Forms

Typically, scanners have non- sensory forms.

, - -, ( ) ( -), .

, , , http://www.w3.org/TR/html4/interact/forms.html#h-17.13, #, .

JavaScript

JavaScript - .

:

+1

AbotX javascript . , .

0

All Articles