This is one of the innovations discussed in rio (full disclosure: I wrote this package). Basically, it provides various ways to import variable shortcuts, including the Hawaiian way of doing things and strangers. Here's a trivial example:
Start by creating a reproducible example:
> library("rio") > export(iris, "iris.dta")
Import using foreign::read.dta() (via rio::import() ):
> str(import("iris.dta", haven = FALSE)) 'data.frame': 150 obs. of 5 variables: $ Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ... $ Sepal.Width : num 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ... $ Petal.Length: num 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ... $ Petal.Width : num 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ... $ Species : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ... - attr(*, "datalabel")= chr "" - attr(*, "time.stamp")= chr "15 Jan 2016 20:05" - attr(*, "formats")= chr "" "" "" "" ... - attr(*, "types")= int 255 255 255 255 253 - attr(*, "val.labels")= chr "" "" "" "" ... - attr(*, "var.labels")= chr "" "" "" "" ... - attr(*, "version")= int -7 - attr(*, "label.table")=List of 1 ..$ Species: Named int 1 2 3 .. ..- attr(*, "names")= chr "setosa" "versicolor" "virginica"
Reading using haven::read_dta() using the variableβs own attributes, because the attributes are stored at the data.frame level and not at the variable level:
> str(import("iris.dta", haven = TRUE, column.labels = TRUE)) 'data.frame': 150 obs. of 5 variables: $ Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ... $ Sepal.Width : num 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ... $ Petal.Length: num 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ... $ Petal.Width : num 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ... $ Species :Class 'labelled' atomic [1:150] 1 1 1 1 1 1 1 1 1 1 ... .. ..- attr(*, "labels")= Named int [1:3] 1 2 3 .. .. ..- attr(*, "names")= chr [1:3] "setosa" "versicolor" "virginica"
Read using haven::read_dta() using an alternative that we (rio developers) have found more convenient:
> str(import("iris.dta", haven = TRUE)) 'data.frame': 150 obs. of 5 variables: $ Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ... $ Sepal.Width : num 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ... $ Petal.Length: num 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ... $ Petal.Width : num 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ... $ Species : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ... - attr(*, "var.labels")=List of 5 ..$ Sepal.Length: NULL ..$ Sepal.Width : NULL ..$ Petal.Length: NULL ..$ Petal.Width : NULL ..$ Species : NULL - attr(*, "label.table")=List of 5 ..$ Sepal.Length: NULL ..$ Sepal.Width : NULL ..$ Petal.Length: NULL ..$ Petal.Width : NULL ..$ Species : Named int 1 2 3 .. ..- attr(*, "names")= chr "setosa" "versicolor" "virginica"
By moving the attributes to the data.frame level, it is much easier to get them with attr(data, "label.var") , etc., rather than digging all the attributes of the variable.
Note: attribute values ββwill be NULL because I just write my own R dataset in a local file to make it reproducible.