I have a webcrawler application. He successfully browsed the most common and simple sites. Now I came across some types of sites where HTML documents are dynamically generated via FORMS or javascripts. I believe that they can be crawled, and I just do not know how to do it. Now these websites do not display the actual HTML page. I mean, if I view this page in IE or firefox, the HTML code is not true on IE or firefox. These sites contain text fields, checkboxes, etc. Therefore, I believe that they are called "web forms". Actually, I am not very familiar with web development, so correct me if I am wrong.
My question is: has anyone in a similar situation like me now successfully solved these types of "problems"? Does anyone know a book or article on web crawling? Those related to these advanced types of sites?
Thank.
source
share