Crawl Websites Using Javascripts or Web Forms

Question

Crawl Websites Using Javascripts or Web Forms

I have a webcrawler application. He successfully browsed the most common and simple sites. Now I came across some types of sites where HTML documents are dynamically generated via FORMS or javascripts. I believe that they can be crawled, and I just do not know how to do it. Now these websites do not display the actual HTML page. I mean, if I view this page in IE or firefox, the HTML code is not true on IE or firefox. These sites contain text fields, checkboxes, etc. Therefore, I believe that they are called "web forms". Actually, I am not very familiar with web development, so correct me if I am wrong.

My question is: has anyone in a similar situation like me now successfully solved these types of "problems"? Does anyone know a book or article on web crawling? Those related to these advanced types of sites?

Thank.

+5

javascript c # windows webforms

Jojo Mar 30 '10 at 10:51

source share

3 answers

There are two separate issues here.

Forms

Typically, scanners have non- sensory forms.

, - -, ( ) ( -), .

, , , http://www.w3.org/TR/html4/interact/forms.html#h-17.13, #, .

JavaScript

JavaScript - .

:

, JS -, .
-
- Rhino env.js

+1

Quentin 30 . '10 11:15

AbotX javascript . , .

0

sjdirect 27 . '16 18:52

Jojo · Accepted Answer · 2010-03-30T11:54:53+0000

I found an article that deals with the deep web and its very interesting, and I think this answers my questions above.

http://www.trycatchfail.com/2008/11/10/creating-a-deep-web-crawler-with-net-background/

One must love this.

Crawl Websites Using Javascripts or Web Forms

Forms

JavaScript

More articles: