I affirm that your problem can be solved in terms of the shortest general supersymmetry . It is assumed that each of the two vectors represents one sequence. Please give the code below a try.
If it still does not solve your problem, you will need to explain exactly what you mean by โmy vector contains not one, but many sequencesโ: determine what you mean by a sequence and tell us how sequences can be identified by scanning through two vectors.
Part I : Given two sequences, find the longest common subsequence
LongestCommonSubsequence <- function(X, Y) { m <- length(X) n <- length(Y) C <- matrix(0, 1 + m, 1 + n) for (i in seq_len(m)) { for (j in seq_len(n)) { if (X[i] == Y[j]) { C[i + 1, j + 1] = C[i, j] + 1 } else { C[i + 1, j + 1] = max(C[i + 1, j], C[i, j + 1]) } } } backtrack <- function(C, X, Y, i, j) { if (i == 1 | j == 1) { return(data.frame(I = c(), J = c(), LCS = c())) } else if (X[i - 1] == Y[j - 1]) { return(rbind(backtrack(C, X, Y, i - 1, j - 1), data.frame(LCS = X[i - 1], I = i - 1, J = j - 1))) } else if (C[i, j - 1] > C[i - 1, j]) { return(backtrack(C, X, Y, i, j - 1)) } else { return(backtrack(C, X, Y, i - 1, j)) } } return(backtrack(C, X, Y, m + 1, n + 1)) }
Part II : Given two sequences, find the shortest common supersymmetry
ShortestCommonSupersequence <- function(X, Y) { LCS <- LongestCommonSubsequence(X, Y)[c("I", "J")] X.df <- data.frame(X = X, I = seq_along(X), stringsAsFactors = FALSE) Y.df <- data.frame(Y = Y, J = seq_along(Y), stringsAsFactors = FALSE) ALL <- merge(LCS, X.df, by = "I", all = TRUE) ALL <- merge(ALL, Y.df, by = "J", all = TRUE) ALL <- ALL[order(pmax(ifelse(is.na(ALL$I), 0, ALL$I), ifelse(is.na(ALL$J), 0, ALL$J))), ] ALL$SCS <- ifelse(is.na(ALL$X), ALL$Y, ALL$X) ALL }
Your example :
ShortestCommonSupersequence(X = c("a","g","b","h","a","g","c"), Y = c("a","g","b","a","g","b","h","c")) # JIXY SCS # 1 1 1 aaa # 2 2 2 ggg # 3 3 3 bbb # 9 NA 4 h <NA> h # 4 4 5 aaa # 5 5 6 ggg # 6 6 NA <NA> bb # 7 7 NA <NA> hh # 8 8 7 ccc
(where the two updated vectors are in columns X and Y )