Extract html tables from a website

Question

Extract html tables from a website

I am trying to use XML, an RCurl package to read some html tables of the following URL http://www.nse-india.com/marketinfo/equities/cmquote.jsp?key=SBINEQN&symbol=SBIN&flag=0&series=EQ#

Here is the code I'm using

library(RCurl) library(XML) options(RCurlOptions = list(useragent = "R")) url <- "http://www.nse-india.com/marketinfo/equities/cmquote.jsp?key=SBINEQN&symbol=SBIN&flag=0&series=EQ#" wp <- getURLContent(url) doc <- htmlParse(wp, asText = TRUE) docName(doc) <- url tmp <- readHTMLTable(doc) ## Required tables tmp[[13]] tmp[[14]]

If you look at the tables, he was unable to parse the values from the web page. I assume this is due to the fact that javascipt evaluation happens on the fly. Now, if I use the option "save page as" in google chrome (it does not work in mozilla) and save the page, then use the code above, which I can read in the values.

But is there any work so that I can read the fly table? It will be great if you can help.

Hello,

+4

r web-scraping rcurl

sayan dasgupta May 06 '11 at 16:56

source share

1 answer

Tim snowhite · Accepted Answer · 2011-05-23T17:19:38+0000

It looks like they are building the page using javascript by accessing http://www.nse-india.com/marketinfo/equities/ajaxGetQuote.jsp?symbol=SBIN&series=EQ and parsing the string. Perhaps you could capture this data and analyze it, instead of clearing the page.

It looks like you will need to create a query with the appropriate referrer headers using cURL. As you can see, you cannot just click this ajaxGetQuote page with an open request.

Perhaps you can read the relevant headers to install them using the Web Inspector in Chrome or Safari, or using Firebug in Firefox.

Extract html tables from a website

More articles: