HTML is read until fully loaded using open-uri and nokogiri

I use open-uri and nokogiri with ruby ​​to create a simple website. There is one problem that html sometimes reads before it is fully loaded. In such cases, I cannot get any content other than the download icon and navigation bar. What is the best way to tell open-uri or nokogiri to wait for the page to fully load?

Currently my script is as follows:

 require 'nokogiri' require 'open-uri' url = "https://www.the-page-i-wanna-crawl.com" doc = Nokogiri::HTML(open(url, ssl_verify_mode: OpenSSL::SSL::VERIFY_NONE)) puts doc.at_css("h2").text 
+6
source share
1 answer

What you described is impossible. The result of open will be passed only HTML after the open method, which will return the full value.

I suspect the page itself is using AJAX to load its content, as suggested in the comments, in which case you can use Watir to retrieve the page using a browser

 require 'nokogiri' require 'watir' browser = Watir::Browser.new browser.goto 'https://www.the-page-i-wanna-crawl.com' doc = Nokogiri::HTML.parse(browser.html) 

This may open a browser window.

+8
source

All Articles