I have an HTML document in R and I want to extract a list of unique tags from this document with a count of their frequency of occurrence.
I could have skipped all possible tags as follows, but was hoping for a solution that does not require a predefined list of tags:
library('XML') url <- 'http://stackoverflow.com/questions/11227809/why-is-processing-a-sorted-array-faster-than-an-unsorted-array' doc <- htmlParse(url) all_tags <- c('//p', '//a', '//b', '//u', '//i') counts <- sapply(all_tags, function(x) length(xpathSApply(doc, x))) free(doc)
source share