Facebook recognition works for most links, not just the best ones like YouTube. Therefore, I assume that they are trying to figure out if the page contains a link to an alternative view, such as a feed. When they find this link, they make a call to get the contents of the feed. Feed formats are generally standardized by RSS or Atom and have clearly identifiable properties such as title , thumbnail , description , etc.
So, let's say you have a link to a YouTube video, for example http://www.youtube.com/watch?v=0Mz4NTozNXw . In its source, it contains the following links with alternative views that can provide the required metadata:
<link rel="alternate" type="application/json+oembed" href="http://www.youtube.com/oembed?url=http%3A//www.youtube.com/watch?v%3D0Mz4NTozNXw&format=json" title="Crispy Onion Rings Recipe - How to Make Crispy Onion Rings" /> <link rel="alternate" type="text/xml+oembed" href="http://www.youtube.com/oembed?url=http%3A//www.youtube.com/watch?v%3D0Mz4NTozNXw&format=xml" title="Crispy Onion Rings Recipe - How to Make Crispy Onion Rings" />
If we want to get the contents of the link with type="text/xml+oembed" , we get the following XML back:
<oembed> <provider_url>http://www.youtube.com/</provider_url> <title>Crispy Onion Rings Recipe - How to Make Crispy Onion Rings</title> <html><object width="480" height="295"><param name="movie" value="http://www.youtube.com/v/0Mz4NTozNXw&fs=1"></param><param name="allowFullScreen" value="true"></param><param name="allowscriptaccess" value="always"></param><embed src="http://www.youtube.com/v/0Mz4NTozNXw&fs=1" type="application/x-shockwave-flash" width="480" height="295" allowscriptaccess="always" allowfullscreen="true"></embed></object></html> <author_name>foodwishes</author_name> <height>295</height> <thumbnail_width>480</thumbnail_width> <width>480</width> <version>1.0</version> <author_url>http://www.youtube.com/user/foodwishes</author_url> <provider_name>YouTube</provider_name> <thumbnail_url>http://i1.ytimg.com/vi/0Mz4NTozNXw/hqdefault.jpg</thumbnail_url> <type>video</type> <thumbnail_height>360</thumbnail_height> </oembed>
From this, you can get the title and sketch thumbnail information, which can then be shown to the end user. This is a general approach that allows most online links to be processed. Maintain a directory of supported link types, for example:
application/atom+xml application/rss+xml application/json+oembed application/json+oembed ...
And check if the links on the page match the types that you support. If yes, then follow this link and get the necessary information. Knowing the type attribute gives you information about the format that is expected to be parsed in advance.
source share