Work on multiple columns at once

What is the programmatic way to work with multiple columns (or rows) at a time? Consider an example:

df2 <- data.frame(a = 1:10, b = 5:14, c = 10:1, d = 14:5)

I would like to create two new columns that contain the ratio a / b and c / d. For simplicity, suppose that the spatial relationship between all columns is consistent. With just two columns, you can do something like this:

df2$ab <- with(df2, a/b)
df2$cd <- with(df2, c/d)

or

df2 <- transform(df2, ab = a/b, cd = c/d)

To create the final product, for example:

    a  b  c  d        ab        cd
1   1  5 10 14 0.2000000 0.7142857
2   2  6  9 13 0.3333333 0.6923077
3   3  7  8 12 0.4285714 0.6666667
4   4  8  7 11 0.5000000 0.6363636
5   5  9  6 10 0.5555556 0.6000000
6   6 10  5  9 0.6000000 0.5555556
7   7 11  4  8 0.6363636 0.5000000
8   8 12  3  7 0.6666667 0.4285714
9   9 13  2  6 0.6923077 0.3333333
10 10 14  1  5 0.7142857 0.2000000

But what if there are 100 columns that you need to iterate over? I usually use Excel or SQL to concatenate the code and drop it into my .R script, but there seems to be a way to solve this with R.

Full disclosure: this question is related to the question of cross-validation , where I hacked into a solution that may not be optimal.

+5
2

:

df2 <- data.frame(a = 1:10, b = 5:14, c = 10:1, d = 14:5, e = 34:43, f = 56:65)

n <- ncol(df2)
s1 <- seq(1, n, 2)
s2 <- seq(2, n, 2)

df2[paste(names(df2)[s1], names(df2)[s2], sep="")] <- df2[, s1] / df2[, s2]
+4

? cbind :

for (i in seq(1, ncol(df2), by = 2)) {
    df2[, paste(names(df2)[i], names(df2)[i+1], sep = "", collapse = "")] <- df2[, i] / df2[, i + 1]
}

EDIT: , , @djhurio's

> set.seed(42)
> df2 <- data.frame(
+       a = sample(1:10, 10e6, T)
+ , b = sample(1:10, 10e6, T)
+ , c = sample(1:10, 10e6, T)
+ , d = sample(1:10, 10e6, T)
+ )
> 
> system.time(
+ for (i in seq(1, ncol(df2), by = 2)) {
+ df2[, paste(names(df2)[i], names(df2)[i+1], sep = "", collapse = "")] <- df2[, i] / df2[, i + 1]
+ }
+ )
   user  system elapsed 
   1.06    0.64    1.70 
> 
> foo <- function(df2) {
+ n <- ncol(df2)
+ s1 <- seq(1, n, 2)
+ s2 <- seq(2, n, 2)
+ df2 <- cbind(df2, df2[, s1] / df2[, s2])
+ names(df2)[(n+1):ncol(df2)] <- paste(names(df2)[s1], names(df2)[s2], sep="")
+ return(df2)
+ }
> 
> set.seed(42)
> df2 <- data.frame(
+       a = sample(1:10, 10e6, T)
+ , b = sample(1:10, 10e6, T)
+ , c = sample(1:10, 10e6, T)
+ , d = sample(1:10, 10e6, T)
+ )
> system.time(foo(df2))
   user  system elapsed 
   1.73    0.85    2.57 
0

All Articles