Ruby Nokogiri Javascript Parsing

I need to parse an array from a website. The part of Javascript that I want to parse is as follows:

_arPic[0] = "http://example.org/image1.jpg"; _arPic[1] = "http://example.org/image2.jpg"; _arPic[2] = "http://example.org/image3.jpg"; _arPic[3] = "http://example.org/image4.jpg"; _arPic[4] = "http://example.org/image5.jpg"; _arPic[5] = "http://example.org/image6.jpg"; 

I get all javascript like this:

 product_page = Nokogiri::HTML(open(full_url)) product_page.css("div#main_column script")[0] 

Is there an easy way to parse all the variables?

+4
source share
2 answers

If I read correctly, are you trying to parse JavaScript and get a Ruby array with your image urls?

Nokogiri only parses HTML / XML, so you need another library; In a cursory search, the RKelly library appears , which has a parse function that accepts a JavaScript string and returns a parse tree.

When you have a parsing tree, you will need to go through it and find the nodes of interest by name (for example, _arPic ), then get the contents of the line on the other side of the job.

Alternatively, if it should not be too reliable (and it is not), you can simply use a regular expression to search for JavaScript, if possible:

 /^\s*_arPic\[\d\] = "(.+)";$/ 

may be a good older regular expression.

+2
source

A simple way:

 _arPic = URI.extract product_page.css("div#main_column script")[0].text 

which can be reduced to:

 _arPic = URI.extract product_page.at("div#main_column script").text 
0
source

All Articles