When sharing links on major websites such as Digg and Facebook; it will create thumbnails by capturing the main page images. How do they capture images from a web page? Does this include loading the entire page (e.g. cURL) and parsing it (e.g. using preg_match)? For me, this method is slow and unreliable. Do they have a more practical method?
PS I think there should be a practical method for quickly crawling a page, skipping some parts (like CSS and JS) in order to achieve the src attributes. Any idea?
source share