Easy way to get started, try jQuery
$("#links").load("/Main_Page #jq-p-Getting-Started li");
More in jQuery Docs
Another way to make screenshots in a much more structured way is to use YQL or the Yahoo query language. It will return cleared data structured as JSON or xml.
eg
Let scrape stackoverflow.com
select * from html where url="http://stackoverflow.com"
will provide you with a JSON array (I selected this option), like this
"results": { "body": { "noscript": [ { "div": { "id": "noscript-padding" } }, { "div": { "id": "noscript-warning", "p": "Qaru works best with JavaScript enabled" } } ], "div": [ { "id": "notify-container" }, { "div": [ { "id": "header", "div": [ { "id": "hlogo", "a": { "href": "/", "img": { "alt": "logo homepage", "height": "70", "src": "http://i.stackoverflow.com/Content/Img/stackoverflow-logo-250.png", "width": "250" } ……..
The beauty is that you can make predictions and where , which ultimately gives you cleared data, structured data and only the data you need (much less bandwidth over the entire cable)
eg,
select * from html where url="http://stackoverflow.com" and xpath='//div/h3/a'
will get you
"results": { "a": [ { "href": "/questions/414690/iphone-simulator-port-for-windows-closed", "title": "Duplicate: Is any Windows simulator available to test iPhone application? as a hobbyist who cannot afford a mac, i set up a toolchain kit locally on cygwin to compile objecti … ", "content": "iphone\n simulator port for windows [closed]" }, { "href": "/questions/680867/how-to-redirect-the-web-page-in-flex-application", "title": "I have a button control ....i need another web page to be redirected while clicking that button .... how to do that ? Thanks ", "content": "How\n to redirect the web page in flex application ?" }, …..
Now, to get only the questions that we do,
select title from html where url="http://stackoverflow.com" and xpath='//div/h3/a'
Pay attention to the title in the projections
"results": { "a": [ { "title": "I don't want the function to be entered simultaneously by multiple threads, neither do I want it to be entered again when it has not returned yet. Is there any approach to achieve … " }, { "title": "I'm certain I'm doing something really obviously stupid, but I've been trying to figure it out for a few hours now and nothing is jumping out at me. I'm using a ModelForm so I can … " }, { "title": "when i am going through my project in IE only its showing errors A runtime error has occurred Do you wish to debug? Line 768 Error:Expected')' Is this is regarding any script er … " }, { "title": "I have a java batch file consisting of 4 execution steps written for analyzing any Java application. In one of the steps, I'm adding few libs in classpath that are needed for my co … " }, { ……
As soon as you write your request, it generates a URL for you
http://query.yahooapis.com/v1/public/yql?q=select%20title%20from%20html%20where%20url%3D% 22http% 3A% 2F% 2Fstackoverflow.com% 22% 20and% 0A% 20% 20% 20% 20% 20% 20xpath% 3D '% 2F% 2Fdiv% 2Fh3% 2fa'% 0A% 20% 20% 20% 20 & format = & JSON amp; callback = cbfunc
in our case.
So, in the end, you end up doing something like this
var titleList = $.getJSON(theAboveUrl);
and play with him.
Pretty , right?