How to parse XML data

I tried to parse the XML data frame R, this link helped me a lot:

how to create an R data frame from an XML file

But still I could not understand my problem:

Here is my code:

data <- xmlParse("http://forecast.weather.gov/MapClick.php?lat=29.803&lon=-82.411&FcstType=digitalDWML") xmlToDataFrame(nodes=getNodeSet(data1,"//data"))[c("location","time-layout")] step1 <- xmlToDataFrame(nodes=getNodeSet(data1,"//location/point"))[c("latitude","longitude")] step2 <- xmlToDataFrame(nodes=getNodeSet(data1,"//time-layout/start-valid-time")) step3 <- xmlToDataFrame(nodes=getNodeSet(data1,"//parameters/temperature"))[c("type="hourly"")] 

The data frame that I want to have is as follows:

 latitude longitude start-valid-time hourly_temperature 29.803 -82.411 2013-06-19T15:00:00-04:00 91 29.803 -82.411 2013-06-19T16:00:00-04:00 90 

I am stuck in xmlToDataFrame, any help would be greatly appreciated, thanks.

+68
xml r
Jun 19 '13 at 18:28
source share
4 answers

XML data is rarely organized so that the xmlToDataFrame function xmlToDataFrame . You better extract everything in the lists and then link the lists together in a data frame:

 require(XML) data <- xmlParse("http://forecast.weather.gov/MapClick.php?lat=29.803&lon=-82.411&FcstType=digitalDWML") xml_data <- xmlToList(data) 

In the case of your example data, getting the location and startup time is pretty simple:

 location <- as.list(xml_data[["data"]][["location"]][["point"]]) start_time <- unlist(xml_data[["data"]][["time-layout"]][ names(xml_data[["data"]][["time-layout"]]) == "start-valid-time"]) 

The temperature data is a little more complicated. First you need to get to node, which contains temperature lists. Then you need to extract both lists, look at each of them, and select the one that has an β€œhourly” value as one of its values. Then you need to select only this list, but only save the values ​​labeled "value":

 temps <- xml_data[["data"]][["parameters"]] temps <- temps[names(temps) == "temperature"] temps <- temps[sapply(temps, function(x) any(unlist(x) == "hourly"))] temps <- unlist(temps[[1]][sapply(temps, names) == "value"]) out <- data.frame( as.list(location), "start_valid_time" = start_time, "hourly_temperature" = temps) head(out) latitude longitude start_valid_time hourly_temperature 1 29.81 -82.42 2013-06-19T16:00:00-04:00 91 2 29.81 -82.42 2013-06-19T17:00:00-04:00 90 3 29.81 -82.42 2013-06-19T18:00:00-04:00 89 4 29.81 -82.42 2013-06-19T19:00:00-04:00 85 5 29.81 -82.42 2013-06-19T20:00:00-04:00 83 6 29.81 -82.42 2013-06-19T21:00:00-04:00 80 
+67
Jun 19 '13 at 19:57
source share

Use xpath more directly for both performance and clarity.

 time_path <- "//start-valid-time" temp_path <- "//temperature[@type='hourly']/value" df <- data.frame( latitude=data[["number(//point/@latitude)"]], longitude=data[["number(//point/@longitude)"]], start_valid_time=sapply(data[time_path], xmlValue), hourly_temperature=as.integer(sapply(data[temp_path], as, "integer")) 

leading to

 > head(df, 2) latitude longitude start_valid_time hourly_temperature 1 29.81 -82.42 2014-02-14T18:00:00-05:00 60 2 29.81 -82.42 2014-02-14T19:00:00-05:00 55 
+68
Feb 14 '14 at 22:54
source share

Here's a partial solution using xml2. Destroying the solution into smaller pieces usually simplifies ensuring the alignment of the entire structure:

 library(xml2) data <- read_xml("http://forecast.weather.gov/MapClick.php?lat=29.803&lon=-82.411&FcstType=digitalDWML") # Point locations point <- data %>% xml_find_all("//point") point %>% xml_attr("latitude") %>% as.numeric() point %>% xml_attr("longitude") %>% as.numeric() # Start time data %>% xml_find_all("//start-valid-time") %>% xml_text() # Temperature data %>% xml_find_all("//temperature[@type='hourly']/value") %>% xml_text() %>% as.integer() 
+10
Jun 02 '16 at 15:21
source share

You can try the code below:

 # Load the packages required to read XML files. library("XML") library("methods") # Convert the input xml file to a data frame. xmldataframe <- xmlToDataFrame("input.xml") print(xmldataframe) 
+3
Sep 09 '16 at 5:30
source share



All Articles