Web Scraper with Nokogiri :: HTML and Ruby - How to get output into an array?

Question

Web Scraper with Nokogiri :: HTML and Ruby - How to get output into an array?

I just started with nokogiri to clear the information from the site and cannot figure out how to do the following. I have HTML code that I want to clear:

<div class="compatible_vehicles"> <div class="heading"> <h3>Compatible Vehicles</h3> </div><!-- .heading --> <ul> <li> <p class="label">Type1</p> <p class="data">All</p> </li> <li> <p class="label">Type2</p> <p class="data">All</p> </li> <li> <p class="label">Type3</p> <p class="data">All</p> </li> <li> <p class="label">Type4</p> <p class="data">All</p> </li> <li> <p class="label">Type5</p> <p class="data">All</p> </li> </ul> </div><!-- .compatible_vehicles -->

And I managed to get the output that I want on my screen with this:

  i = 0 doc.css('div > .compatible_vehicles > ul > li').each do |item| label = item.at_css(".label").text data = item.at_css(".data").text print "#{label} - #{data}" + ',' end i += 1

This gives me the following list: Type1 - All, Type2 - All, Type3 - All, Type4 - All, Type5 - All, on the screen.

Now I want to get this value in an array to save it in a CSV file. I tried a few things, but most attempts get the error "Can't convert String to Array." Hope someone can help me with this!

+4

ruby nokogiri scrape

user2215918 Mar 27 '13 at 14:16

source share

1 answer

the tin man · Accepted Answer · 2013-03-27T14:38:07+0000

Starting with HTML:

 html = ' <div class="compatible_vehicles"> <div class="heading"> <h3>Compatible Vehicles</h3> </div><!-- .heading --> <ul> <li> <p class="label">Type1</p> <p class="data">All</p> </li> <li> <p class="label">Type2</p> <p class="data">All</p> </li> <li> <p class="label">Type3</p> <p class="data">All</p> </li> <li> <p class="label">Type4</p> <p class="data">All</p> </li> <li> <p class="label">Type5</p> <p class="data">All</p> </li> </ul> </div><!-- .compatible_vehicles --> '

We parse it with Nokogiri and iterate over the <li> tags to get the contents of the <p> :

 require 'nokogiri' doc = Nokogiri::HTML(html) data = doc.search('.compatible_vehicles li').map{ |li| li.search('p').map { |p| p.text } }

Returns an array of arrays:

 => [["Type1", "All"], ["Type2", "All"], ["Type3", "All"], ["Type4", "All"], ["Type5", "All"]]

From there, you can connect this to the examples for the CSV class and make it work without problems.

Now compare your code with the output to the fields on the screen:

 data.map{ |a| a.join(' - ') }.join(', ') => "Type1 - All, Type2 - All, Type3 - All, Type4 - All, Type5 - All"

All I have to do is puts , and it will print correctly.

It is very important to consider returning useful data structures. In Ruby, hashes and arrays are very useful, because we can iterate over them and mass them in many forms. It would be trivial to create a hash from an array of arrays:

 Hash[data] => {"Type1"=>"All", "Type2"=>"All", "Type3"=>"All", "Type4"=>"All", "Type5"=>"All"}

To do this is very easy to do a search.

Web Scraper with Nokogiri :: HTML and Ruby - How to get output into an array?

More articles: