How to iterate over all data sets (and determine their number of columns)?

Question

How to iterate over all data sets (and determine their number of columns)?

I would like to skip all the data sets of all available (= installed) packages and find out if these data sets have 6 or more columns. Here is my test:

dat.list <- data(package=.packages(all.available=TRUE))$results # list of all installed packages colnames(dat.list) # "Package" "LibPath" "Item" (= name of data set) "Title" (= description) idx <- c() i <- 3 ## for(i in nrow(dat.list)) { nme <- dat.list[[i,"Item"]] # data set as string data(list=nme, package=dat.list[[i,"Package"]]) # load the data ## => fails with warning: In data(list = nme, package = dat.list[[i, "Package"]]) : ## data set 'BJsales.lead (BJsales)' not found dat <- eval(as.name(nme)) # assign the data to the variable dat ncl <- ncol(dat) if(!is.null(ncl) && ncl >= 6) idx <- c(idx, i) ## }

Obviously does not work, so I fixed the index (here: 3) to see what was failing. How (if not via nme above) can I determine the name of the data set to store the data set in a variable and then access its number of columns?

UPDATE By combining the messages from jeremycg and nico, I came up with this (again: it is not ideal to understand the names of the datasets, but it passes):

 dat.list <- data(package=.packages(all.available=TRUE))$results # list of all installed packages idx <- c() for (i in 1:nrow(dat.list)) { require(dat.list[i, "Package"], character.only=TRUE) raw.name <- dat.list[i, "Item"] # data set (and parenthetical suffix) as raw string name <- gsub('\\s.*','', raw.name) # name of data set dat <- tryCatch(get(name), error=function(e) e) # assign the data to the variable dat (if not erroneous) if(is(dat, "simpleError")) { warning("Element ",i," threw an error") dat <- NA } ncl <- ncol(dat) if(!is.null(ncl) && ncl >= 6) idx <- c(idx, i) } dat.list[idx, c("Package", "Item")]

+7

r

Marius hofert Aug 17 '15 at 16:57

source share

1 answer

nico · Accepted Answer · 2015-08-17T17:16:59+0000

I assume that you need to download the data access package.

So you need to add at the beginning of the loop:

 require(dat.list[[i, "Package"]], character.only = TRUE)

(see this question why you need to use the charachter.only variable)

Note that you also need to change your loop:

 for(i in nrow(dat.list))

to

 for(i in 1:nrow(dat.list))

There is another problem: some data arrays are returned with a name also in parentheses. For example:

 wine.classes (wine)

So we need to take them off. Easy to do with:

 dat.list[,3] <- sapply(strsplit(dat.list[,3], " "), function(x){x[1]})

Finally, dat.list can only be obtained using [] , you do not need [[]] (easier to read!).

So finally:

 # List of all installed packages dat.list <- data(package=.packages(all.available=TRUE))$results # Remove package name in parentheses dat.list[,3] <- sapply(strsplit(dat.list[, "Item"], " "), function(x){x[1]}) idx <- c() for (i in 1:nrow(dat.list)) { require(dat.list[i, "Package"], character.only = T) nme <- dat.list[i,"Item"] # data set as string data(list=nme, package=dat.list[i,"Package"]) # load the data dat <- eval(as.name(nme)) # assign the data to the variable dat ncl <- ncol(dat) if(!is.null(ncl) && ncl >= 6) idx <- c(idx, i) }

and

 > dat.list[idx, "Item"] [1] "Seatbelts" "USJudgeRatings" "WorldPhones" "airquality" [5] "anscombe" "attitude" "crimtab" "euro.cross" [9] "infert" "longley" "mtcars" "occupationalStatus" [13] "state.x77" "swiss" "volcano" "car.test.frame" [17] "car90" "solder" "stagec" "bladder" [21] "bladder1" "bladder2" "cancer" "cgd" [25] "cgd0" "colon" "flchain" "heart" [29] "jasa" "jasa1" "kidney" "lung" [33] "mgus" "mgus1" "mgus2" "nwtco" [37] "ovarian" "pbc" "pbcseq" "rats2" [41] "transplant" "veteran" "soldat" "patch" [45] "tooth"

How to iterate over all data sets (and determine their number of columns)?

More articles: