Combining vectors of unequal length and non-unique values

I would like to do the following:

combined into a data frame, two vectors that

  • have different lengths
  • contain sequences also found in another vector
  • contain sequences not found in another vector
  • sequences that are not found in another vector are never longer than three elements
  • always has the same first element

Equal sequences should be displayed in the data frame in two vectors aligned with NA in the column, if the vector does not contain a sequence present in another vector.

For instance:

vector 1 vector 2 vector 1 vector 2 1 1 aa 2 2 gg 3 3 bb 4 1 or ha 1 2 ag 2 3 gb 5 4 ch 5 c 

should be combined into a data frame

  1 1 aa 2 2 gg 3 3 bb 4 NA h NA 1 1 or aa 2 2 gg NA 3 NA b NA 4 NA h 5 5 cc 

What I did was search for examples of merging, combining, cbind, plyr, but could not find a solution. I'm afraid I will need to start writing a function with nested loops to solve this problem.

+4
source share
2 answers

I affirm that your problem can be solved in terms of the shortest general supersymmetry . It is assumed that each of the two vectors represents one sequence. Please give the code below a try.

If it still does not solve your problem, you will need to explain exactly what you mean by โ€œmy vector contains not one, but many sequencesโ€: determine what you mean by a sequence and tell us how sequences can be identified by scanning through two vectors.

Part I : Given two sequences, find the longest common subsequence

 LongestCommonSubsequence <- function(X, Y) { m <- length(X) n <- length(Y) C <- matrix(0, 1 + m, 1 + n) for (i in seq_len(m)) { for (j in seq_len(n)) { if (X[i] == Y[j]) { C[i + 1, j + 1] = C[i, j] + 1 } else { C[i + 1, j + 1] = max(C[i + 1, j], C[i, j + 1]) } } } backtrack <- function(C, X, Y, i, j) { if (i == 1 | j == 1) { return(data.frame(I = c(), J = c(), LCS = c())) } else if (X[i - 1] == Y[j - 1]) { return(rbind(backtrack(C, X, Y, i - 1, j - 1), data.frame(LCS = X[i - 1], I = i - 1, J = j - 1))) } else if (C[i, j - 1] > C[i - 1, j]) { return(backtrack(C, X, Y, i, j - 1)) } else { return(backtrack(C, X, Y, i - 1, j)) } } return(backtrack(C, X, Y, m + 1, n + 1)) } 

Part II : Given two sequences, find the shortest common supersymmetry

 ShortestCommonSupersequence <- function(X, Y) { LCS <- LongestCommonSubsequence(X, Y)[c("I", "J")] X.df <- data.frame(X = X, I = seq_along(X), stringsAsFactors = FALSE) Y.df <- data.frame(Y = Y, J = seq_along(Y), stringsAsFactors = FALSE) ALL <- merge(LCS, X.df, by = "I", all = TRUE) ALL <- merge(ALL, Y.df, by = "J", all = TRUE) ALL <- ALL[order(pmax(ifelse(is.na(ALL$I), 0, ALL$I), ifelse(is.na(ALL$J), 0, ALL$J))), ] ALL$SCS <- ifelse(is.na(ALL$X), ALL$Y, ALL$X) ALL } 

Your example :

 ShortestCommonSupersequence(X = c("a","g","b","h","a","g","c"), Y = c("a","g","b","a","g","b","h","c")) # JIXY SCS # 1 1 1 aaa # 2 2 2 ggg # 3 3 3 bbb # 9 NA 4 h <NA> h # 4 4 5 aaa # 5 5 6 ggg # 6 6 NA <NA> bb # 7 7 NA <NA> hh # 8 8 7 ccc 

(where the two updated vectors are in columns X and Y )

+3
source

Note - this was suggested as an answer to the first version of OP. The question has been changed since then, but the problem is still undefined in my opinion.


Here is a solution that works with your integer example and will also work with numeric vectors. I also assume that:

  • both vectors contain the same number of sequences
  • a new sequence begins, where value[i+1] <= value[i]

If your vectors are not numeric or if one of my assumptions does not match your problem, you will have to clarify.

 v1 <- c(1,2,3,4,1,2,5) v2 <- c(1,2,3,1,2,3,4,5) v1.sequences <- split(v1, cumsum(c(TRUE, diff(v1) <= 0))) v2.sequences <- split(v2, cumsum(c(TRUE, diff(v2) <= 0))) align.fun <- function(s1, s2) { #aligns two sequences s12 <- sort(unique(c(s1, s2))) cbind(ifelse(s12 %in% s1, s12, NA), ifelse(s12 %in% s2, s12, NA)) } do.call(rbind, mapply(align.fun, v1.sequences, v2.sequences)) # [,1] [,2] # [1,] 1 1 # [2,] 2 2 # [3,] 3 3 # [4,] 4 NA # [5,] 1 1 # [6,] 2 2 # [7,] NA 3 # [8,] NA 4 # [9,] 5 5 
+6
source

All Articles