Get background image with Nokogiri from DOM?

Question

Get background image with Nokogiri from DOM?

I clean the site and I can’t get the images because they are loaded using CSS background.

Is there a way to get these attributes using Nokogiri without using Phantom.js or Sentinel? The background image actually uses inline styles, so I should be able to.

I need to get images from an array of URLS:

<div class="zoomLens" style="background-image: url(http://resources1.okadirect.com/assets/en/new/catalogue/1200x1200/EHD005MET-L_01.jpg?version=7); background-position: -14.7368421052632px -977.894736842105px; background-repeat: no-repeat;">&nbsp;</div>

I use Nokogiri through Mechanize, but I don’t know how to write it correctly:

 image = agent.get(doc.parser.at('.zoomLens')["background-image"]).save("okaimages/f_deco-#{counter}.jpg")

+5

html ruby nokogiri

Gibson Jan 29 '15 at 16:44

source share

1 answer

the tin man · Answer 1 · 2015-01-29T18:05:18+0000

I would use something like:

 require 'nokogiri' doc = Nokogiri::HTML('<div class="zoomLens" style="background-image: url(http://resources1.okadirect.com/assets/en/new/catalogue/1200x1200/EHD005MET-L_01.jpg?version=7); background-position: -14.7368421052632px -977.894736842105px; background-repeat: no-repeat;">&nbsp;</div>') doc.search('.zoomLens').map{ |n| n['style'][/url\((.+)\)/, 1] } # => ["http://resources1.okadirect.com/assets/en/new/catalogue/1200x1200/EHD005MET-L_01.jpg?version=7"]

A trick is a suitable template for capturing the contents of parentheses. n['style'][/url\((.+)\)/, 1] uses String#[] , which can take a grouped regular expression and return a specific group from captures. See https://www.regex101.com/r/mV6rY6/1 for a breakdown of what it does.

At this point, you will be sitting on an array of image URLs. You can easily iterate over the list and use OpenURI or any number of other HTTP clients to retrieve the images.

Get background image with Nokogiri from DOM?

More articles: