Count row rows in columns

I have a text file:

V1 V2 V3 XN aaaaaabbbabab CT ababaaabaaabb VH babbbabaabbba 

What I want to do is to calculate how many a and how many b are in the column of each V3.

So, the result will be like this:

  col1 col2 col3 ....... col13 a 2 2 2 1 b 1 1 1 2 

How can I do that?

I tried the count function along with the substring, but it did not work.

thanks

+4
source share
3 answers

Assuming dat contains your data, we are processing using strsplit() in

 tt <- matrix(unlist(strsplit(dat$V3, split = "")), ncol = 13, byrow = TRUE) 

giving:

 > tt [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [1,] "a" "a" "a" "a" "a" "a" "b" "b" "b" "a" "b" "a" "b" [2,] "a" "b" "a" "b" "a" "a" "a" "b" "a" "a" "a" "b" "b" [3,] "b" "a" "b" "b" "b" "a" "b" "a" "a" "b" "b" "b" "a" 

We can get the desired results by observing the correct level settings:

 apply(tt, 2, function(x) c(table(factor(x, levels = c("a","b"))))) 

which gives:

 > apply(tt, 2, function(x) c(table(factor(x, levels = c("a","b"))))) [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] a 2 2 2 1 2 3 1 1 2 2 1 1 1 b 1 1 1 2 1 0 2 2 1 1 2 2 2 

To automate the selection of suitable levels, we could do something like:

 > lev <- levels(factor(tt)) > apply(tt, 2, function(x, levels) c(table(factor(x, levels = lev))), + levels = lev) [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] a 2 2 2 1 2 3 1 1 2 2 1 1 1 b 1 1 1 2 1 0 2 2 1 1 2 2 2 

where in the first line we consider tt as a vector and extract the levels after temporarily converting tt to a factor. Then we provide these levels ( lev ) to the apply() step instead of explicitly specifying the levels.

+4
source

EDIT: Solution fixed after Gavin Simpson's comments. It works now


To avoid many conversions to the coefficient, you can use the following index trick and type:

 tt <- c("aaaaaabbbabab","ababaaabaaabb","babbbabaabbba") ttstr <- strsplit(tt,"") ttf <- factor(unlist(ttstr)) n <- length(ttstr[[1]]) k <- length(ttstr) > do.call(cbind,tapply(ttf,rep(1:n,k),table)) 1 2 3 4 5 6 7 8 9 10 11 12 13 a 2 2 2 1 2 3 1 1 2 2 1 1 1 b 1 1 1 2 1 0 2 2 1 1 2 2 2 

This gives an acceleration of about 7 times for the method shown by @Gavin

 > benchmark(method1(tt),method2(tt),replications=1) test replications elapsed relative user.self 1 method1(tt) 1 0.89 1.000000 0.89 2 method2(tt) 1 6.99 7.853933 6.98 
+2
source

Here is the new version to draw a real question. gregexpr is still used, but this time using indexes. I need to get a little out of the way to allow for cells with zero quantity (which I cannot get in the table?)

 foo <- data.frame( V1 = c("X","C","V"), V2 = c("N","T","H"), V3 = c("aaaaaabbbabab","ababaaabaaabb","babbbabaabbba")) n <- nchar(as.character(foo$V3)[1]) tabA <- table(unlist(gregexpr("a",foo$V3)),exclude=-1) tabB <- table(unlist(gregexpr("b",foo$V3)),exclude=-1) res <- matrix(0,2,n) res[1,as.numeric(names(tabA))] <- tabA res[2,as.numeric(names(tabB))] <- tabB rownames(res) <- c("a","b") res [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] a 2 2 2 1 2 3 1 1 2 2 1 1 1 b 1 1 1 2 1 0 2 2 1 1 2 2 2 

Without null cells, you can just do rbind(tabA,tabB) .

0
source

All Articles