Ruby Nokogiri Javascript Parsing

Question

Ruby Nokogiri Javascript Parsing

I need to parse an array from a website. The part of Javascript that I want to parse is as follows:

_arPic[0] = "http://example.org/image1.jpg"; _arPic[1] = "http://example.org/image2.jpg"; _arPic[2] = "http://example.org/image3.jpg"; _arPic[3] = "http://example.org/image4.jpg"; _arPic[4] = "http://example.org/image5.jpg"; _arPic[5] = "http://example.org/image6.jpg";

I get all javascript like this:

 product_page = Nokogiri::HTML(open(full_url)) product_page.css("div#main_column script")[0]

Is there an easy way to parse all the variables?

+4

javascript ruby nokogiri

nohayeye Jan 22 '13 at 15:21

source share

2 answers

A simple way:

 _arPic = URI.extract product_page.css("div#main_column script")[0].text

which can be reduced to:

 _arPic = URI.extract product_page.at("div#main_column script").text

0

pguardiario Jan 23 '13 at 1:36

source share

Ron warholic · Accepted Answer · 2013-01-22T15:36:11+0000

If I read correctly, are you trying to parse JavaScript and get a Ruby array with your image urls?

Nokogiri only parses HTML / XML, so you need another library; In a cursory search, the RKelly library appears , which has a parse function that accepts a JavaScript string and returns a parse tree.

When you have a parsing tree, you will need to go through it and find the nodes of interest by name (for example, _arPic ), then get the contents of the line on the other side of the job.

Alternatively, if it should not be too reliable (and it is not), you can simply use a regular expression to search for JavaScript, if possible:

 /^\s*_arPic\[\d\] = "(.+)";$/

may be a good older regular expression.

Ruby Nokogiri Javascript Parsing

More articles: