R: Retrieving the tag attribute from “labeled” column columns from harbor imports from Stata

The Hadley Wickham haven package, applied to the Stata file, returns a slice with many tagged columns. You can see them with str (), for example:

 $ MSACMSZ :Class 'labelled' atomic [1:8491861] NA NA NA NA NA NA NA NA NA NA ... .. ..- attr(*, "label")= chr "metropolitan area size (cmsa/msa)" .. ..- attr(*, "labels")= Named int [1:7] 0 1 2 3 4 5 6 .. .. ..- attr(*, "names")= chr [1:7] "not identified or nonmetropolitan" "100,000 - 249,999" "250,000 - 499,999" "500,000 - 999,999" ... 

It would be nice if I could just extract all these labeled vectors to factors, but I compared the length of the label attribute with the number of unique values ​​in each vector, and sometimes it is longer and sometimes shorter. Therefore, I think that I need to look at everyone and decide how to deal with each of them individually.

So, I would like to extract the attribute values ​​of labels into a list. However, this function:

 labels93 <- lapply(cps_00093.df, function(x){attr(X, which="labels", exact=TRUE)}) 

returns NULL for all variables.

No problem? How to extract these attributes from column columns to list?

Note that the label vector is named, and I need both labels and names.

As per @ Hack-R's request, here is a tiny fragment of my data converted by dput (which I have never used before). I applied this code:

 filter(cps_00093.df, YEAR==2015) %>% sample_n(10) %>% select(HHTENURE, HHINTYPE) -> tiny dput(tiny, file = "tiny") 

to create a tiny file. Hello! It was easy! I thought it would be difficult to break off from this small part.

Opening up tiny with Notepad ++, here is what I found:

 structure(list(HHTENURE = structure(c(2L, 1L, 1L, 2L, 1L, 1L, 1L, 2L, 1L, 1L), labels = structure(c(0L, 1L, 2L, 3L, 6L, 7L), .Names = c("niu", "owned or being bought", "rented for cash", "occupied without payment of cash rent", "refused", "don't know")), class = "labelled"), HHINTYPE = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), labels = structure(1:3, .Names = c("interview", "type a non-interview", "type b/c non-interview")), class = "labelled")), row.names = c(NA, -10L), class = c("tbl_df", "tbl", "data.frame"), .Names = c("HHTENURE", "HHINTYPE")) 

I suspect that this can be made more readable with a small interval, but I did not want to hide with it, fearing the accidental destruction of relevant information.

+5
source share
2 answers

I am going to answer this question, although my code is not very pretty.

First, I create a function to retrieve a named attribute from a single column.

 ColAttr <- function(x, attrC, ifIsNull) { # Returns column attribute named in attrC, if present, else isNullC. atr <- attr(x, attrC, exact = TRUE) atr <- if (is.null(atr)) {ifIsNull} else {atr} atr } 

Then a function to bind it to all columns:

 AtribLst <- function(df, attrC, isNullC){ # Returns list of values of the col attribute attrC, if present, else isNullC lapply(df, ColAttr, attrC=attrC, ifIsNull=isNullC) } 

Finally, I run it for each attribute.

 stub93 <- AtribLst(cps_00093.df, attrC="label", isNullC=NA) labels93 <- AtribLst(cps_00093.df, attrC="labels", isNullC=NA) labels93 <- labels93[!is.na(labels93)] 

All columns have a label attribute, but only some of them are labeled and therefore have a label attribute. The label attribute is called where the labels correspond to data values, and the names tell you what those values ​​mean.

+2
source

The original question asks how to "extract the attribute values ​​of labels into a list." The solution to the main question follows (provided that some_df imported via haven and has label attributes):

 library(purrr) n <- ncol(some_df) labels_list <- map(1:n, function(x) attr(some_df[[x]], "label") ) # if a vector of character strings is preferable labels_vector <- map_chr(1:n, function(x) attr(some_df[[x]], "label") ) 
+1
source

All Articles