Returns df with column values ​​that occur more than once

I have a df data frame, and I am trying to multiply all the rows that matter in column B that occur more than once in the data set.

I tried using the table to do this, but I was having problems with a subset of the table:

 t<-table(df$B) 

Then I will try a subset using:

 subset(df, table(df$B)>1) 

And I get an error

"Error in x [subset and! Is.na (subset)]: an object of type closure is not a subset."

How can I multiply my data frame using table counting?

+8
r dataframe subset
Jul 01 '14 at 5:52
source share
3 answers

Here is the dplyr solution (using mrFlick data.frame)

 library(dplyr) newd <- dd %>% group_by(b) %>% filter(n()>1) # newd # ab # 1 1 1 # 2 2 1 # 3 5 4 # 4 6 4 # 5 7 4 # 6 9 6 # 7 10 6 

Or using data.table

 setDT(dd)[,if(.N >1) .SD,by=b] 

Or using the base R

 dd[dd$b %in% unique(dd$b[duplicated(dd$b)]),] 
+16
Jul 01 '14 at 6:17
source share

Can I suggest an alternative, faster way to do this with data.table ?

 require(data.table) ## 1.9.2 setDT(df)[, .N, by=B][N > 1L]$B 

(or) you can bind .I (another special variable - see ?data.table ), which gives the number of the corresponding line in df , as well as .N as follows:

 setDT(df)[df[, .I[.N > 1L], by=B]$V1] 

(or) look at @mnel different for another option (using another special .SD variable).

+6
Jul 01 '14 at 5:57
source share

Using table() not the best, because then you have to connect to the source lines of data.frame. The ave function makes it easy to calculate line level values ​​for different groups. for example

 dd<-data.frame( a=1:10, b=c(1,1,2,3,4,4,4,5,6, 6) ) dd[with(dd, ave(b,b,FUN=length))>1, ] #subset(dd, ave(b,b,FUN=length)>1) #same thing ab 1 1 1 2 2 1 5 5 4 6 6 4 7 7 4 9 9 6 10 10 6 

Here for each level b it calculates the length b , which is actually just the number b and returns this back to the corresponding line for each value. Then we use this subset.

+5
Jul 01 '14 at 6:08
source share



All Articles