Why am I getting a bad result when using Nokogiri "search"?

I want to clear data from specific divs in a CarFax report. However, when I look for divs, I always get this strange garbage output.

I tried search(#divId), search(.divClass)and even tried to grab all the divs with search('div'). Every time I get similar results: the contents of the div are partially truncated, and the tags are all messed up.

This is the url that I upload to my agent: https://gist.github.com/atkolkma/8024287

This is the code (user and password are missing):

require "rubygems"
require "mechanize"

scraper = Mechanize.new
scraper.user_agent_alias = 'Mac Safari'
scraper.follow_meta_refresh = true
scraper.redirect_ok = true

scraper.get("http://www.carfaxonline.com")
form = scraper.page.forms.first
form.j_username = "******"
form.j_password = "*****"
scraper.submit(form)

scraper.get("http://www.carfaxonline.com/api/report?vin=1G1AT58H697144202&track=true")

puts scraper.page.search("#headerBodyType")

This is what the file returns when I run it:

</div>4 DRderBodyType">

I expect:

<div id="headerBodyType"> SEDAN 4 DR </div>

, HTML, , search, ! HTML- com-pics dot com :

scraper2 = Mechanize.new

scraper2.get("http://www.chevy-pics.com/test.html")

puts scraper2.page.search("#headerBodyType")

, :

<div id="headerBodyType"> SEDAN 4 DR </div>
+4
1

, Mac OS 9, \r ( ). puts , , , . , , , .

, , p puts. - "<div id=\"headerBodyType\">\r SEDAN 4 DR\r </div>" . \r, .

, , , . , , gsub \r \n. , , , Nokogiri , , .

, , :

puts scraper.page.search("#headerBodyType").to_s.gsub("\r", "\n")
+2

All Articles