Screen scraper

Just curious: what do you think are the best tools for creating automatic screenshots these days? is .Net Agility a good option? What do you do with cleaning sites that use a lot of AJAX?

+4
source share
4 answers

I believe that if the page has a fairly static layout, then the HTML Agility package is perfect for getting all the data that I need. I did not come across one page that he could not cope with, and did not get the results that I wanted.

If you find that the page is displayed with a lot of dynamic code, you will have to do more than just load the page, you really need to execute it.

To do this, you need something like a WebKit.NET library (.NET, a wrapper around the WebKit rendering engine ) that will allow you to load the page and actually run Javascript. Then, once you are sure that the document is fully displayed, you can get information about the page.

+7
source

For the very basics, I use:

I don't have JavaScript yet, but I plan on using the Google V8 JavaScript Engine . This requires you to make unmanaged code calls, but V8 performance justifies it.

+4
source

Selenium is a good tool for automating screen cleaning. There are 2 things: 1) install the Selenium IDE (works only in Firefox). 2) Install Selenium RC Server

After starting the Selenium IDE, go to the site that you are trying to automate and start recording the events that you do on the site. Consider writing a macro in a browser. After that, you will get the code output for the language you need.

Just to let you know that Browsermob uses Selenium to load test and automate tasks in the browser.

I downloaded ppt, which I did a while ago. This should save you a good time - http://www.4shared.com/get/tlwT3qb_/SeleniumInstructions.html

In the link above, select the regular boot option.

I spent a lot of time understanding this, so I thought it could save some time.

0
source

The best tool "these days" is one that not only gives you the necessary functions (Javascript, automation), but also one that you do not need to run on your own ... I, of course, refer to using the cloud service. This approach will save you network bandwidth, provide faster results (because it can scale better than your own solution, which is likely to develop) and, most importantly, saves you the headache of IT and maintenance.

In this note, check out the scrambling solution called Bobik ( http://usebobik.com ). I wrote an article about this at http://zscraper.wordpress.com/2012/07/03/a-comparison-shopping-android-app-without-backend/ .

Hope this helps.

0
source

All Articles