You have 2 options:
Switch HTTP cleanup to some tool that supports javascript evaluation, for example Capybara (with the correct driver selected). This can be slow since you are using a browser without a browser under the hood, plus you have to set some timeouts or indicate another way to make sure that the blocks of text that interest you are loaded before starting any scrapers.
The second option is to use the Web Developer console and find out how these blocks of text are loaded (which AJAX calls, their parameters, etc.) and implement them in your scraper. This is a more advanced approach, but more effective, since you will not do any additional work, as you did in option 1.
Have a nice day!
UPDATE:
Your code above does not work because the response is HTML code wrapped in a JSON object while you are trying to parse it as raw HTML. It looks like this:
{ "error": 0, "msg": "request successful", "paidDocIds": "some ids here", "itemStartIndex": 20, "lastPageNum": 50, "markup": 'LOTS AND LOTS AND LOTS OF MARKUP' }
You need to deploy JSON and then parse as HTML:
require 'json' json = JSON.parse(open(url).read)
I will also advise you using open-uri , as your code may become vulnerable if you use dynamic URLs because of the way open-uri works (read the related article for details) and use good and more functional libraries like HTTParty and RestClient .
UPDATE 2: Minimum script work for me:
require 'json' require 'open-uri' require 'nokogiri' url = 'http://www.justdial.com/functions/ajxsearch.php?national_search=0&act=pagination&city=Delhi+%2F+NCR&search=Pandits&where=Delhi+Cantt&catid=1195&psearch=&prid=&page=2' json = JSON.parse(open(url).read)
source share