How to create R data frame from xml file

I have an XML document file. Part of the file looks like this:

-<attr> <attrlabl>COUNTY</attrlabl> <attrdef>County abbreviation</attrdef> <attrtype>Text</attrtype> <attwidth>1</attwidth> <atnumdec>0</atnumdec> -<attrdomv> -<edom> <edomv>C</edomv> <edomvd>Clackamas County</edomvd> <edomvds/> </edom> -<edom> <edomv>M</edomv> <edomvd>Multnomah County</edomvd> <edomvds/> </edom> -<edom> <edomv>W</edomv> <edomvd>Washington County</edomvd> <edomvds/> </edom> </attrdomv> </attr> 

From this XML file, I want to create an R data frame with the columns attrlabl, attrdef, attrtype and attrdomv. Note that the attrdomv column must include all levels for the category variable. The data frame should look like this:

 attrlabl attrdef attrtype attrdomv COUNTY County abbreviation Text C Clackamas County; M Multnomah County; W Washington County 

I have an incomplete code:

 doc <- xmlParse("taxlots.shp.xml") dataDictionary <- xmlToDataFrame(getNodeSet(doc,"//attrlabl")) 

Could you fill in my R code? I appreciate any help!

+8
xml r
Nov 27 '12 at 8:22
source share
1 answer

Assuming this is the correct taxlots.shp.xml file:

 <attr> <attrlabl>COUNTY</attrlabl> <attrdef>County abbreviation</attrdef> <attrtype>Text</attrtype> <attwidth>1</attwidth> <atnumdec>0</atnumdec> <attrdomv> <edom> <edomv>C</edomv> <edomvd>Clackamas County</edomvd> <edomvds/> </edom> <edom> <edomv>M</edomv> <edomvd>Multnomah County</edomvd> <edomvds/> </edom> <edom> <edomv>W</edomv> <edomvd>Washington County</edomvd> <edomvds/> </edom> </attrdomv> </attr> 

You were almost there:

 doc <- xmlParse("taxlots.shp.xml") xmlToDataFrame(nodes=getNodeSet(doc1,"//attr"))[c("attrlabl","attrdef","attrtype","attrdomv")] attrlabl attrdef attrtype attrdomv 1 COUNTY County abbreviation Text CClackamas CountyMMultnomah CountyWWashington County 

But the last field does not have the desired format. This will require additional steps:

 step1 <- xmlToDataFrame(nodes=getNodeSet(doc1,"//attrdomv/edom")) step1 edomv edomvd edomvds 1 C Clackamas County 2 M Multnomah County 3 W Washington County step2 <- paste(paste(step1$edomv, step1$edomvd, sep=" "), collapse="; ") step2 [1] "C Clackamas County; M Multnomah County; W Washington County" cbind(xmlToDataFrame(nodes= getNodeSet(doc1, "//attr"))[c("attrlabl", "attrdef", "attrtype")], attrdomv= step2) attrlabl attrdef attrtype attrdomv 1 COUNTY County abbreviation Text C Clackamas County; M Multnomah County; W Washington County 
+9
Nov 27 '12 at 9:19
source share



All Articles