Word Count with Ruby

I am trying to figure out a way to count words in a specific line containing html.

Example line:

<p>Hello World</p> 

Is there a way in Ruby to count words between tags? Or any tag, for that matter?

Examples:

 <p>Hello World</p> <h2>Hello World</h2> <li>Hello World</li> 

Thanks in advance!

Edit (here is my working code)

Controller:

 class DashboardController < ApplicationController def index @pages = Page.find(:all) @word_count = [] end end 

View:

 <% @pages.each do |page| %> <% page.current_state.elements.each do |el| %> <% @count = Hpricot(el.description).inner_text.split.uniq.size %> <% @word_count << @count %> <% end %> <li><strong>Page Name: <%= page.slug %> (Word Count: <%= @word_count.inject(0){|sum,n| sum+n } %>)</strong></li> <% end %> 
+4
source share
4 answers

Here's how you can do it:

 require 'hpricot' content = "<p>Hello World...." doc = Hpricot(content) doc.inner_text.split.uniq 

You'll get:

 [ [0] "Hello", [1] "World" ] 

(sidenote: output formatted awesome_print , which I recommend)

+6
source

Sure,

  • Use Nokogiri to parse HTML / XML and XPath to find the element and its text value.
  • Divide by spaces to count words
+2
source

You want to use something like Hpricot to remove HTML, and this is just a case of word counting in plain text.

Here is an example of HTML stripping: http://underpantsgnome.com/2007/01/20/hpricot-scrub/

0
source

First, start with something capable of parsing HTML, such as Hpricot , then use a simple regular expression to do what you want (you can just separate over spaces and then, for example, count)

0
source

All Articles