I use Ruby, Selenium WebDriver and Nokogiri to retrieve data from web pages. After loading the proper HTML, I print the contents of a specific class.
For example,
require "selenium-webdriver" require "nokogiri" browser = Selenium::WebDriver.for :chrome browser.get "https://jsfiddle.net" doc = Nokogiri::HTML.parse(browser.page_source) doc.css('.aiButton').map(&:text).join(',')
I found that the hardest part is loading the HTML correctly. For example, the content that I want may be hidden by some javascript or it may be on another page.
Is it possible to use Selenium to load a page, and then manually manipulate the page to display the correct HTML, and then allow the bot to complete and print the content that it should use?
ruby selenium screen-scraping webdriver nokogiri
Joe morano
source share