How to use plyr for line number?

Basically, I want a column with an auto-incrementing identifier based on my cohorts - in this case. (kmer, cvCut)

> myDataFrame size kmer cvCut cumsum 1 8132 23 10 8132 10000 778 23 10 13789274 30000 324 23 10 23658740 50000 182 23 10 28534840 100000 65 23 10 33943283 200000 25 23 10 37954383 250000 584 23 12 16546507 300000 110 23 12 29435303 400000 28 23 12 34697860 600000 127 23 2 47124443 600001 127 23 2 47124570 

I need an added column that has new row names based on the maper / cvCut group

  > myDataFrame size kmer cvCut cumsum newID 1 8132 23 10 8132 1 10000 778 23 10 13789274 2 30000 324 23 10 23658740 3 50000 182 23 10 28534840 4 100000 65 23 10 33943283 5 200000 25 23 10 37954383 6 250000 584 23 12 16546507 1 300000 110 23 12 29435303 2 400000 28 23 12 34697860 3 600000 127 23 2 47124443 1 600001 127 23 2 47124570 2 
+7
r plyr
source share
3 answers

I would do it like this:

 library(plyr) ddply(df, c("kmer", "cvCut"), transform, newID = seq_along(kmer)) 
+14
source share

Just add a new column every time plyr calls you:

 R> DF <- data.frame(kmer=sample(1:3, 50, replace=TRUE), \ cvCut=sample(LETTERS[1:3], 50, replace=TRUE)) R> library(plyr) R> ddply(DF, .(kmer, cvCut), function(X) data.frame(X, newId=1:nrow(X))) kmer cvCut newId 1 1 A 1 2 1 A 2 3 1 A 3 4 1 A 4 5 1 A 5 6 1 A 6 7 1 A 7 8 1 A 8 9 1 A 9 10 1 A 10 11 1 A 11 12 1 B 1 13 1 B 2 14 1 B 3 15 1 B 4 16 1 B 5 17 1 B 6 18 1 C 1 19 1 C 2 20 1 C 3 21 2 A 1 22 2 A 2 23 2 A 3 24 2 A 4 25 2 A 5 26 2 B 1 27 2 B 2 28 2 B 3 29 2 B 4 30 2 B 5 31 2 B 6 32 2 B 7 33 2 C 1 34 2 C 2 35 2 C 3 36 2 C 4 37 3 A 1 38 3 A 2 39 3 A 3 40 3 A 4 41 3 B 1 42 3 B 2 43 3 B 3 44 3 B 4 45 3 C 1 46 3 C 2 47 3 C 3 48 3 C 4 49 3 C 5 50 3 C 6 R> 
+4
source share

I think this is what you want:

Download data:

 x <- read.table(textConnection( "id size kmer cvCut cumsum 1 8132 23 10 8132 10000 778 23 10 13789274 30000 324 23 10 23658740 50000 182 23 10 28534840 100000 65 23 10 33943283 200000 25 23 10 37954383 250000 584 23 12 16546507 300000 110 23 12 29435303 400000 28 23 12 34697860 600000 127 23 2 47124443 600001 127 23 2 47124570"), header=TRUE) 

Use ddply:

 library(plyr) ddply(x, .(kmer, cvCut), function(x) cbind(x, 1:nrow(x))) 
+2
source share

All Articles