I would like to analyze the html page and extract meaningful text from it. Does anyone know any good algorithms?
I develop my applications on Rails, but I think Ruby is a bit slower at that, so I think that if there was some good library in c, that would be appropriate.
Thanks!!
PD: Please do not recommend anything with java
UPDATE: I found this link text
Sorry, is in python
c html ruby html-parsing html-content-extraction
Nisanio
source share