Scrambling an AngularJS Application

Question

Scrambling an AngularJS Application

I will break some HTML pages using Rails using Nokogiri.

I had some problems when I tried to abandon the AngularJS page because the gem opens the HTML before it is fully displayed.

Is there any way to discard this type of page? How can I make a page fully processed before it is cleared?

+1

angularjs ruby web-scraping nokogiri

vicente.fava Nov 19 '14 at 21:03

source share

2 answers

You can use:

require 'phantomjs' require 'watir' b = Watir::Browser.new(:phantomjs) b.goto URL doc = Nokogiri::HTML(b.html)

Download phantomjs at http://phantomjs.org/download.html and move the binary to / usr / bin

+1

thalespf Jan 27 '15 at 3:33

source share

Mike · Accepted Answer · 2014-11-19T21:20:55+0000

If you are trying to completely clear AngularJS pages, you will probably need something like what @tadman mentioned in the comments (PhantomJS) - some type of mute browser that fully processes AngularJS JavaScript and then opens the DOM for verification.

If you have a specific site or sites that you want to clean up, the path of least resistance most likely completely excludes the AngularJS frontman and directly requests the API from which the Angular code pulls the content. The standard scenario for most / most AngularJS sites is that they pull out static JS and HTML code / templates, and then they make ajax callbacks to the server (either native or some third-party API) to get content to be provided. If you look at their code, you can directly query what Angular calls (i.e. via $ http, ngResource or restangular). The returned data is usually JSON and it is much easier to collect against true scraping in the results of post-rendered html.

Scrambling an AngularJS Application

More articles: