I would like to reproduce the functionality that Facebook uses to parse links. When you submit a link to your Facebook status, their system shuts down and extracts the proposed title , summary and often one or more relevant image from this page, from which you can choose a thumbnail.
My application should accomplish this using Python, but I am open to any guidance, blog post, or experiences of other developers that relate to this and can help me figure out how to do this.
I would love to learn from other people before just jumping.
To be clear, when you provide the URL of a webpage, I want to be able to get:
- Title: Probably only the
<title> , but possibly <h1> , not sure. - An overview of the page with one paragraph.
- A bunch of related images that can be used as thumbnails. (The hard part is filtering out irrelevant images such as banners or rounded corners).
I may have to implement it myself, but at least I would like to learn about how other people perform such tasks.
python semantics facebook screen-scraping summary
Ram rachum
source share