Delete duplicate rows based on conditions from multiple columns in r

I have a dataset, I would like to delete rows of data that have duplicate information in 4 different columns.

foo<- data.frame(g1 = c("1","0","0","1","1"), v1 = c("7","5","4","4","3"), v2 = c("a","b","x","x","e"), y1 = c("y","c","f","f","w"), y2= c("y","y","y","f","c"), y3 = c("y","c","c","f","w"), y4= c("y","y","f","f","c"), y5=c("y","w","f","f","w"), y6=c("y","c","f","f","w")) 

foo looks like this:

  g1 v1 v2 y1 y2 y3 y4 y5 y6 1 1 7 ayyyyyy 2 0 5 bcycywc 3 0 4 xfycfff 4 1 4 xffffff 5 1 3 ewcwcww 

Now I want to delete any row with duplicate data based on columns Y1-6. Thus, only lines 4 and 1 will be deleted if they are executed correctly, based on all Y-variables that are exact. Its condition is with multiple columns.

I believe that I am close, but it just does not work correctly.

I tried: new = foo[!(duplicated(foo[,1:6]))] thinking of using the duplicated command that he would look for and find only those that match exactly?

I was thinking about using a conditional operator with &, but can't figure out how to do this.
new = foo[foo$y1==foo$y2|foo$y3|foo$y4|foo$y5|foo$y6]

I thought about that, but Im now overloaded and lost. I would expect foo to look like this:

  g1 v1 v2 y1 y2 y3 y4 y5 y6 2 0 5 bcycywc 3 0 4 xfycfff 5 1 3 ewcwcww 
+6
source share
3 answers
 > foo[apply(foo[ , paste("y", 1:6, sep = "")], 1, FUN = function(x) length(unique(x)) > 1 ), ] g1 v1 v2 y1 y2 y3 y4 y5 y6 2 0 5 bcycywc 3 0 4 xfycfff 5 1 3 ewcwcww 
+10
source
 foo[apply(foo, 1, function(x) any(x != x[1])),] 
+2
source
 > foo[ !rowSums( apply( foo[2:6], 2, "!=", foo[1] ) )==0, ] y1 y2 y3 y4 y5 y6 2 cycywc 3 fycfff 5 wcwcww > foo[ ! colSums( apply( foo, 1, duplicated, foo[1] ) ) == 5, ] y1 y2 y3 y4 y5 y6 2 cycywc 3 fycfff 5 wcwcww 
+1
source

Source: https://habr.com/ru/post/925421/


All Articles