How to calculate the number of occurrences of a given character in each row of a row column?

I have a data.frame in which some variables contain a text string. I want to count the number of occurrences of a given character in every single line.

Example:

q.data<-data.frame(number=1:3, string=c("greatgreat", "magic", "not")) 

I want to create a new column for q.data with the number of occurrences of "a" in the row (for example, c (2,1,0)).

The only confusing approach I managed:

 string.counter<-function(strings, pattern){ counts<-NULL for(i in 1:length(strings)){ counts[i]<-length(attr(gregexpr(pattern,strings[i])[[1]], "match.length")[attr(gregexpr(pattern,strings[i])[[1]], "match.length")>0]) } return(counts) } string.counter(strings=q.data$string, pattern="a") number string number.of.a 1 1 greatgreat 2 2 2 magic 1 3 3 not 0 
+51
regex r dataframe
Sep 14 '12 at 15:17
source share
8 answers

The stringr package has a str_count function that seems to do what interests you

 # Load your example data q.data<-data.frame(number=1:3, string=c("greatgreat", "magic", "not"), stringsAsFactors = F) library(stringr) # Count the number of 'a in each element of string q.data$number.of.a <- str_count(q.data$string, "a") q.data # number string number.of.a #1 1 greatgreat 2 #2 2 magic 1 #3 3 not 0 
+66
Sep 14 '12 at 15:25
source share

If you do not want to leave base R, here is a rather concise and expressive opportunity:

 x <- q.data$string sapply(regmatches(x, gregexpr("g", x)), length) # [1] 2 1 0 

Update:. With R 3.2.0, lengths(x) can be used as a more efficient replacement for sapply(x, length) . So the above code might just be

 lengths(regmatches(x, gregexpr("g", x))) 
+29
Sep 14 '12 at 15:44
source share
 nchar(as.character(q.data$string)) -nchar( gsub("a", "", q.data$string)) [1] 2 1 0 

Note that I force the factor variable to a character before moving on to nchar. Regular expression functions seem to do this internally.

Here are the test results (with increased test size up to 3000 lines)

  q.data<-q.data[rep(1:NROW(q.data), 1000),] str(q.data) 'data.frame': 3000 obs. of 3 variables: $ number : int 1 2 3 1 2 3 1 2 3 1 ... $ string : Factor w/ 3 levels "greatgreat","magic",..: 1 2 3 1 2 3 1 2 3 1 ... $ number.of.a: int 2 1 0 2 1 0 2 1 0 2 ... benchmark( Dason = { q.data$number.of.a <- str_count(as.character(q.data$string), "a") }, Tim = {resT <- sapply(as.character(q.data$string), function(x, letter = "a"){ sum(unlist(strsplit(x, split = "")) == letter) }) }, DWin = {resW <- nchar(as.character(q.data$string)) -nchar( gsub("a", "", q.data$string))}, Josh = {x <- sapply(regmatches(q.data$string, gregexpr("g",q.data$string )), length)}, replications=100) #----------------------- test replications elapsed relative user.self sys.self user.child sys.child 1 Dason 100 4.173 9.959427 2.985 1.204 0 0 3 DWin 100 0.419 1.000000 0.417 0.003 0 0 4 Josh 100 18.635 44.474940 17.883 0.827 0 0 2 Tim 100 3.705 8.842482 3.646 0.072 0 0 
+6
sept. '12 at 19:23
source share
 sum(charToRaw("abc.d.aa") == charToRaw('.')) 

- a good option.

+3
Jul 6 '16 at 16:17
source share

I'm sure someone can do better, but this works:

 sapply(as.character(q.data$string), function(x, letter = "a"){ sum(unlist(strsplit(x, split = "")) == letter) }) greatgreat magic not 2 1 0 

or in function:

 countLetter <- function(charvec, letter){ sapply(charvec, function(x, letter){ sum(unlist(strsplit(x, split = "")) == letter) }, letter = letter) } countLetter(as.character(q.data$string),"a") 
+2
Sep 14
source share
 s <- "aababacababaaathhhhhslsls jsjsjjsaa ghhaalll" p <- "a" s2 <- gsub(p,"",s) numOcc <- nchar(s) - nchar(s2) 

May not be effective, but solve my goal.

0
May 08 '15 at 6:00
source share

I count characters just like Amarjeet. However, I prefer to do this on one line.

 HowManySpaces<-nchar(DF$string)-nchar(gsub(" ","",DF$string)) # count spaces in DF$string 
0
Nov 13 '17 at 12:09 on
source share

The easiest and cleanest way IMHO:

 q.data$number.of.a <- lengths(gregexpr('a', q.data$string)) # number string number.of.a` #1 1 greatgreat 2` #2 2 magic 1` #3 3 not 0` 
0
Dec 26 '17 at 9:54 on
source share



All Articles