Can R automatically recognize and count the number of occurrences of a word in n number of columns?

Question

Can R automatically recognize and count the number of occurrences of a word in n number of columns?

This is a tough question, but I will try my best to explain. I am trying to write a program that tracks how many times an insect has visited a flower species over time. For this, I have a dataset that looks something like this:

ID          Visit_Freq   Visitor_1   Visitor_2   Visitor_3   Visitor_4   Visitor_5
1             1.0000000  Halictidae       <NA>       <NA>       <NA>       <NA>
2             5.0000000  Syrphidae Halictidae  Syrphidae  Syrphidae       Apis
3             1.0000000        Apis       <NA>       <NA>       <NA>       <NA>
4             0.0000000        <NA>       <NA>       <NA>       <NA>       <NA>
5             0.0000000        <NA>       <NA>       <NA>       <NA>       <NA>
6             0.0000000        <NA>       <NA>       <NA>       <NA>       <NA>
7             0.0000000        <NA>       <NA>       <NA>       <NA>       <NA>
8             2.0000000        Apis       Apis       <NA>       <NA>       <NA>
9             0.0000000        <NA>       <NA>       <NA>       <NA>       <NA>
10            0.0000000        <NA>       <NA>       <NA>       <NA>       <NA>

In the columns "Visitor_n" I recorded the type of insect that visited this flower, or NA for visits. To analyze our data, we must count each occurrence of an insect type in all Visitor columns. Sometimes we can have up to 10 flower visitors (IDs), and we often have more than 500 IDs, so manually counting events can be a tedious task. Here is what I did to make it easier:

Apis <- sum(apply(DataSet[3:7], 2, function(x) length(which(x == 'Apis'))))

, , Apis , , , , , 30-50 , "Apis" ... ...

Apis <- sum(apply(DataSet[3:7], 2, function(x) length(which(x == 'Apis'))))
Bombus <- sum(apply(DataSet[3:7], 2, function(x) length(which(x == 'Bombus'))))
Halictidae <- sum(apply(DataSet[3:7], 2, function(x) length(which(x == 'Halictidae'))))
Syrphidae <- sum(apply(DataSet[3:7], 2, function(x) length(which(x == 'Syrphidae'))))
Skipper <- sum(apply(DataSet[3:7], 2, function(x) length(which(x == 'Skipper'))))

.. ..

, R , " A, B, D, F H [3: 7], , ", , , 30-50 , , .

, , R, . , R.

, .

+4

r

Adam 06 . '15 3:23

3

user227710 · Answer 1 · 2015-07-06T04:00:47+0000

, , - , - .

data.frame(table(grep("[A-Z]{1}[a-z]+",stack(df1)[,1],value=TRUE)))
        Var1 Freq
1       Apis    4
2 Halictidae    2
3  Syrphidae    3

df1<-
structure(list(ID = 1:10, Visit_Freq = c(1, 5, 1, 0, 0, 0, 0, 
2, 0, 0), Visitor_1 = c("Halictidae", "Syrphidae", "Apis", "<NA>", 
"<NA>", "<NA>", "<NA>", "Apis", "<NA>", "<NA>"), Visitor_2 = c("<NA>", 
"Halictidae", "<NA>", "<NA>", "<NA>", "<NA>", "<NA>", "Apis", 
"<NA>", "<NA>"), Visitor_3 = c("<NA>", "Syrphidae", "<NA>", "<NA>", 
"<NA>", "<NA>", "<NA>", "<NA>", "<NA>", "<NA>"), Visitor_4 = c("<NA>", 
"Syrphidae", "<NA>", "<NA>", "<NA>", "<NA>", "<NA>", "<NA>", 
"<NA>", "<NA>"), Visitor_5 = c("<NA>", "Apis", "<NA>", "<NA>", 
"<NA>", "<NA>", "<NA>", "<NA>", "<NA>", "<NA>")), .Names = c("ID", 
"Visit_Freq", "Visitor_1", "Visitor_2", "Visitor_3", "Visitor_4", 
"Visitor_5"), row.names = c(NA, -10L), class = "data.frame")

vaettchen · Answer 2 · 2015-07-06T05:38:24+0000

insects <- c( "Apis", "Halictidae", "Syrphidae" )

insects <- unique( unlist( DataSet[ 3:7 ] ) )
insects <- insects[ -( which ( insects == "<NA>" ) ) ]

,

count <- NULL

, ,

for( i in insects ) 
    count <- c( count, sum( apply( DataSet[ 3:7 ], 2, 
                       function( x ) length( which( x == i) ) ) ) )
count
[1] 4 2 3

,

insectCount <- data.frame( insects, count )
insectCount
     insects count
1       Apis     4
2 Halictidae     2
3  Syrphidae     3

, , . .

Claus wilke · Answer 3 · 2015-07-06T07:16:15+0000

dplyr , , () . , ( gather() tidyr).

, user227710 . , "<NA> " R NA, , NA, .

Actual work is done using the group_by()and functions tally(). You tell R how the data should be grouped (here's the variable Species), and then tally()count them.

And I understand that you do not want to use external packages, but training to use tidyrand dplyris absolutely justified for all who regularly solves data.

require(tidyr) # for gather()
require(dplyr) # for group_by() and tally()

# convert table into tidy (long) format
df_long <- gather(df1, Visitor, Species, Visitor_1:Visitor_5)
head(df_long)
##   ID Visit_Freq   Visitor    Species
## 1  1          1 Visitor_1 Halictidae
## 2  2          5 Visitor_1  Syrphidae
## 3  3          1 Visitor_1       Apis
## 4  4          0 Visitor_1       <NA>
## 5  5          0 Visitor_1       <NA>
## 6  6          0 Visitor_1       <NA>

# now count species, excluding the <NA> value
group_by(df_long, Species) %>%
    filter(Species != "<NA>") %>% 
    tally()
## Source: local data frame [3 x 2]
## 
##      Species  n
## 2       Apis  4
## 3 Halictidae  2
## 4  Syrphidae  3

Can R automatically recognize and count the number of occurrences of a word in n number of columns?

More articles: