Is there a concept of shortcuts / alias / pointer in R?

Question

Is there a concept of shortcuts / alias / pointer in R?

I am working on a fairly large dataset in R, which is split into several data frames.

The problem is that I do some things with the whole set, sometimes I just need to work with or change parts of the set, and my selectors get very awkward, fe

aListOfItems$attribute4([aListOfItems$attribute1 == true & aListOfItems$attribute2 == 6 & aListOfItems$attribute3 == "C"),] <- aListOfItems([aListOfItems$attribute1 == true & aListOfItems$attribute2 == 6 & aListOfItems$attribute3 == "C"),aListOfItems$attribute5] * aListOfItems([aListOfItems$attribute1 == true & aListOfItems$attribute2 == 6 & aListOfItems$attribute3 == "C"),aListOfItems$attribute7]

(This sets attribute 4 (attribute5 * attribute6) for the selected portion of all records.)

It is terrible to read, understand and edit.

Separating this data into different data frames is not really an option due to RAM and because I update this data regularly and rebuilding all the individual data frames, it will also be a pain.

So, is there a way to do something like

 items_t6C <- &(aListOfItems([aListOfItems$attribute1 == true & aListOfItems$attribute2 == 6 & aListOfItems$attribute3 == "C"),]

so i can use

 items_t6C$attribute4 <- # do something

As an alternative, perhaps you can save such a selector in a string variable and use it?

+4

r

racoonie Apr 05 '13 at 9:00

source share

2 answers

The data.table package may be useful to you.

data.table works mostly by reference. Especially when assigning and changing columns. Especially if you fall into RAM, the efficiency in the data.table is dramatic

In addition, data.table is built into functionality with within by subset , etc., which makes calls much shorter and the code more readable.

For example, the expression described above can be simplified to the following:

 aDTofItems[attribute1 & attribute2==6 & attribute3=="C", # filter attribute4 := attribute5 * attribute6] # assign

Also, if the attributes you are filtering are key tables, then the row is even shorter:

 aDTofItems[.(TRUE, 6, "C"), # filter attribute4 := attribute5 * attribute6] # assign

Assuming the structure of each item is comparable, you can force your list to a data table with

 aDTofItems <- rbindlist(aListOfItems) # note, if you have factors in your list you should convert them to character before calling rbindlist # or similarly, although a bit slower aDTofItems <- data.table(do.call(rbind, aListOfItems))

+2

Ricardo saporta Apr 7 '13 at 20:43

source share

Paul hiemstra · Accepted Answer · 2013-04-05T09:19:54+0000

First you can build a logical vector, give it a meaningful name and use it in a command. This makes your script a little longer, but much easier to read:

 interesting_bit = with(aListOfItems, attribute1 & attribute2 == 6 & attribute3 == "C")

In addition, using a little indentation makes the code more readable.

 aListOfItems$attribute4[interesting_bit,] <- aListOfItems[interesting_bit,aListOfItems$attribute5] * aListOfItems[interesting_bit,aListOfItems$attribute7]

And using within does more for readability:

 aListOfItems[interesting_bit,] = within(aListOfItems[interesting_bit,], { attribute4 = attribute5 * attribute7 }

Also, for logic, there is no need to explicitly test == true :

 interesting_bit = aListOfItems$attribute1 & aListOfItems$attribute2 == 6 & aListOfItems$attribute3 == "C"

This ultimately reduces this:

 aListOfItems$attribute4([aListOfItems$attribute1 == true & aListOfItems$attribute2 == 6 & aListOfItems$attribute3 == "C"),] <- aListOfItems([aListOfItems$attribute1 == true & aListOfItems$attribute2 == 6 & aListOfItems$attribute3 == "C"),aListOfItems$attribute5] * aListOfItems([aListOfItems$attribute1 == true & aListOfItems$attribute2 == 6 & aListOfItems$attribute3 == "C"),aListOfItems$attribute7]

(note the additional use of with ):

 interesting_bit = with(aListOfItems, attribute1 & attribute2 == 6 & attribute3 == "C") aListOfItems[interesting_bit,] = within(aListOfItems[interesting_bit,], { attribute4 = attribute5 * attribute7 }

This code not only looks less complicated, but also instantly conveys what you are doing, which is very difficult to guess from your source code.

Is there a concept of shortcuts / alias / pointer in R?

More articles: