I have a Newzupp news aggregator that I want to change. Now I just display the news headlines, and I associate them with their URL.
I plan to make it more graphic using images + headers instead of simple headers. I want to know how to get the main image of each article (somewhat similar to Google news).
One of the ways I can think of is to strip all the images and display an image that points to the same article. But I do not think it will be effective. Is there any other way to do this?
I found a solution for it.
- Get URL content [html / xml]
- Clear contents using hpricot
- Find all items tagged with "img"
- Do some research to determine which one is the main image displayed. [Like 6th image in case of Wired.com rss channel]
I still think this is very inefficient. I would like to know how services like Google News clean sites / blogs and display relevant images.
source share