Javascript scrambling in R with RSelenium

I am trying to clear the Washington Post police filming database . Since this is not html, I cannot use rvest , so I used RSelenium and phantomjs instead .

 library(RSelenium) checkForServer() startServer() eCap <- list(phantomjs.binary.path = "C:/Program Files/Chrome Driver/phantomjs.exe") remDr <- remoteDriver(browserName = "phantomjs", extraCapabilities = eCap) remDr$open() remDr$navigate("http://www.washingtonpost.com/graphics/national/police-shootings/") 

When checking the source, it is obvious that the elements that interest me have the following id and class

 <div id="js-list-690" class="listWrapper cf"> 

or in Chrome:

screenshot of the source of the corresponding item in Chrome

I can access the text of an individual element:

 remDr$findElement("css", "#js-list-691")$getElementText() 

returns

 [[1]] [1] "An unidentified person, a 47-year-old Hispanic man, was shocked with a stun gun and shot on July 30, 2015, in Whittier, Calif. Los Angeles County deputies were investigating a domestic disturbance when he threatened the officers and struck one of them with a metal rod.\nMALEDEADLY WEAPONHISPANIC45 TO 54\nCBS Los AngelesWhittier Daily News"} 

But if I want to get a list of all these elements:

 remDr$findElements("class name", "listWrapper cf") 

leads to an error.

Like me

  • Get a list of all the elements that share this class listWrapper cf ?
  • Returns a list of text associated with each item?
+4
source share
1 answer

It would be easier to just use the JSON data directly (use the "Developer Tools" in almost any modern browser to track downloaded URLs ... it won't take long to find on this list):

 library(jsonlite) url <- "https://js.washingtonpost.com/graphics/policeshootings/policeshootings.json?d14385542" shootings <- fromJSON(url) dplyr::glimpse(shootings) ## Observations: 564 ## Variables: ## $ id (int) 3, 4, 5, 8, 9, 11, 13, 15, 16, 17, 19, 21, ... ## $ date (chr) "2015-01-02", "2015-01-02", "2015-01-03", "... ## $ description (chr) "Elliot, who was on medication for depressi... ## $ blurb (chr) "a 53-year-old man of Asian heritage armed ... ## $ name (chr) "Tim Elliot", "Lewis Lee Lembke", "John Pau... ## $ age (int) 53, 47, 23, 32, 39, 18, 22, 35, 34, 47, 25,... ## $ gender (chr) "M", "M", "M", "M", "M", "M", "M", "M", "F"... ## $ race (chr) "A", "W", "H", "W", "H", "W", "H", "W", "W"... ## $ armed (chr) "gun", "gun", "unarmed", "toy weapon", "nai... ## $ city (chr) "Shelton", "Aloha", "Wichita", "San Francis... ## $ state (chr) "WA", "OR", "KS", "CA", "CO", "OK", "AZ", "... ## $ address (chr) "600 block of E. Island Lake Drive", "4519 ... ## $ lat (dbl) 47.24683, 45.48620, 37.69477, 37.76291, 40.... ## $ lon (dbl) -123.12159, -122.89128, -97.28055, -122.422... ## $ is_geocoding_exact (lgl) TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, T... ## $ mental (lgl) TRUE, FALSE, FALSE, TRUE, FALSE, FALSE, FAL... ## $ sources (list) http://kbkw.com/local-news/329755, http://... ## $ photos (list) NULL, NULL, 107, , , , //img.washingtonpos... ## $ videos (list) NULL, NULL, NULL, NULL, NULL, NULL, NULL, ... 
+3
source

All Articles