Web Scraper (in R?)

I want to get the names of the companies in the middle column of this page (in bold in blue), as well as the location of the person registering the complaint (for example, “India, Delhi”, written in green). Basically, I need a table (or data frame) with two columns, one for the company and the other for the location. Any ideas?

+2
source share
2 answers

You can easily do this using the package XMLin R. Here is the code

url = "http://www.consumercomplaints.in/bysubcategory/mobile-service-providers/page/1.html"
doc = htmlTreeParse(url, useInternalNodes = T)

profiles = xpathSApply(doc, "//a[contains(@href, 'profile')]", xmlValue)
profiles = profiles[!(1:length(profiles) %% 2)]

states   = xpathSApply(doc, "//a[contains(@href, 'bystate')]", xmlValue)
+10
source

, , , , , , .

preg_match('/>[a-zA-Z0-9]+<\/a><\/h4><\/td>/', $str, $matches);
for($i = 0;$i<sizeof($matches);$i++)
 echo $matches[$i];

.

+1

All Articles