Index of a non-unique element in a data frame

Question

Index of a non-unique element in a data frame

How can I extract the column names (or row and column index) of the duplicate element in the next data frame?

V1 V2 V3 V4 PC1 0.5863431 0.5863431 3.952237e-01 3.952237e-01 PC2 -0.3952237 -0.3952237 5.863431e-01 5.863431e-01 PC3 -0.7071068 0.7071068 1.665335e-16 3.885781e-16

For example, 0.5863431 is 0.5863431 , so "V1" and "V2" are the names of the columns.

In this frame, I want to get:

 [1] "V1" "V2" "V3" "V4"

As you can see, most likely only the result of the first line.

Second example:

  V1 V2 V3 V4 PC1 -0.5987139 -0.5987139 -0.03790446 0.5307039 PC2 -0.0189601 -0.0189601 -0.99315168 -0.1137136 PC3 0.3986891 0.3523926 -0.11045319 0.8394442

Result:

 [1] "V1" "V2"

+6

r

Sultan Hashimov May 22, '16 at 17:05

source share

3 answers

There may be a better way, but here I take it upon myself.

 ## coerce to matrix (if not already) m <- as.matrix(df) ## find duplicates across both margins d <- duplicated(m, MARGIN = 0) | duplicated(m, MARGIN = 0, fromLast = TRUE) ## grab the unique col names colnames(m)[unique(col(d)[d])]

Examples: In your first data frame -

 df1 <- read.table(text = "V1 V2 V3 V4 PC1 0.5863431 0.5863431 3.952237e-01 3.952237e-01 PC2 -0.3952237 -0.3952237 5.863431e-01 5.863431e-01 PC3 -0.7071068 0.7071068 1.665335e-16 3.885781e-16", header = TRUE) m1 <- as.matrix(df1) d1 <- duplicated(m1, MARGIN = 0) | duplicated(m1, MARGIN = 0, fromLast = TRUE) colnames(m1)[unique(col(d1)[d1])] # [1] "V1" "V2" "V3" "V4"

And on the second -

 df2 <- read.table(text = "V1 V2 V3 V4 PC1 -0.5987139 -0.5987139 -0.03790446 0.5307039 PC2 -0.0189601 -0.0189601 -0.99315168 -0.1137136 PC3 0.3986891 0.3523926 -0.11045319 0.8394442", header = TRUE) m2 <- as.matrix(df2) d2 <- duplicated(m2, MARGIN = 0) | duplicated(m2, MARGIN = 0, fromLast = TRUE) colnames(m2)[unique(col(d2)[d2])] # [1] "V1" "V2"

Note: Since your data contains all numerical values, I would recommend starting with a matrix instead of a data frame.

+8

Rich scriven May 22, '16 at 17:21

source share

If you use any approach you use, check out FAQ 7.31 when dealing with floating point numbers. You might want to create a new matrix in which you have rounded them by the same number of digits; although they may “look” the same on the printout, there may be differences that you do not see in the final figures.

+1

Data munger May 22, '16 at 22:29

source share

Jota · Accepted Answer · 2016-05-22T18:01:59+0000

A slightly different approach using which and apply

 # convert to matrix mat1 <- as.matrix(df1) # find duplicates and store them dups <- mat1[which(duplicated(c(mat1)))] # identify columns containing a value in dups names(which(apply(mat1, 2, function(x) any(x %in% dups)))) #[1] "V1" "V2" "V3" "V4" mat2 <- as.matrix(df2) dups <- mat2[which(duplicated(c(mat2)))] names(which(apply(mat2, 2, function(x) any(x %in% dups)))) #[1] "V1" "V2"

Index of a non-unique element in a data frame

More articles: