The Hadley Wickham haven package, applied to the Stata file, returns a slice with many tagged columns. You can see them with str (), for example:
$ MSACMSZ :Class 'labelled' atomic [1:8491861] NA NA NA NA NA NA NA NA NA NA ... .. ..- attr(*, "label")= chr "metropolitan area size (cmsa/msa)" .. ..- attr(*, "labels")= Named int [1:7] 0 1 2 3 4 5 6 .. .. ..- attr(*, "names")= chr [1:7] "not identified or nonmetropolitan" "100,000 - 249,999" "250,000 - 499,999" "500,000 - 999,999" ...
It would be nice if I could just extract all these labeled vectors to factors, but I compared the length of the label attribute with the number of unique values in each vector, and sometimes it is longer and sometimes shorter. Therefore, I think that I need to look at everyone and decide how to deal with each of them individually.
So, I would like to extract the attribute values of labels into a list. However, this function:
labels93 <- lapply(cps_00093.df, function(x){attr(X, which="labels", exact=TRUE)})
returns NULL for all variables.
No problem? How to extract these attributes from column columns to list?
Note that the label vector is named, and I need both labels and names.
As per @ Hack-R's request, here is a tiny fragment of my data converted by dput (which I have never used before). I applied this code:
filter(cps_00093.df, YEAR==2015) %>% sample_n(10) %>% select(HHTENURE, HHINTYPE) -> tiny dput(tiny, file = "tiny")
to create a tiny file. Hello! It was easy! I thought it would be difficult to break off from this small part.
Opening up tiny with Notepad ++, here is what I found:
structure(list(HHTENURE = structure(c(2L, 1L, 1L, 2L, 1L, 1L, 1L, 2L, 1L, 1L), labels = structure(c(0L, 1L, 2L, 3L, 6L, 7L), .Names = c("niu", "owned or being bought", "rented for cash", "occupied without payment of cash rent", "refused", "don't know")), class = "labelled"), HHINTYPE = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), labels = structure(1:3, .Names = c("interview", "type a non-interview", "type b/c non-interview")), class = "labelled")), row.names = c(NA, -10L), class = c("tbl_df", "tbl", "data.frame"), .Names = c("HHTENURE", "HHINTYPE"))
I suspect that this can be made more readable with a small interval, but I did not want to hide with it, fearing the accidental destruction of relevant information.