Best way to reduce sequences in an array of strings

Please, now that I have rewritten this question, and before it receives further answers of a quick answer or a premature closure by impatient editors, let me point out that this is not a duplicate of this question . I know how to remove duplicates from an array.

This question is about removing sequences from an array, not duplication in the strict sense.

Consider this sequence of elements in an array;

[0] a [1] a [2] b [3] c [4] c [5] a [6] c [7] d [8] c [9] d 

In this example, I want to get the following ...

 [0] a [1] b [2] c [3] a [4] c [5] d 

Note that duplicate elements are preserved, but sequences of the same element are reduced to one instance of this element.

In addition, note that when repeating two lines, they should be reduced to one set (of two lines).

 [0] c [1] d [2] c [3] d 

... comes down to ...

 [0] c [1] d 

I code in C #, but algorithms in any language are evaluated.

+6
c # algorithm
source share
4 answers

Here the C # application I wrote solves this problem.

accepts
aabccacdcd

<strong> exits
abcacd

It probably looks rather random, pulled me a little to lower the bit of dynamic length.

 class Program { private static List<string> values; private const int MAX_PATTERN_LENGTH = 4; static void Main(string[] args) { values = new List<string>(); values.AddRange(new string[] { "a", "b", "c", "c", "a", "c", "d", "c", "d" }); for (int i = MAX_PATTERN_LENGTH; i > 0; i--) { RemoveDuplicatesOfLength(i); } foreach (string s in values) { Console.WriteLine(s); } } private static void RemoveDuplicatesOfLength(int dupeLength) { for (int i = 0; i < values.Count; i++) { if (i + dupeLength > values.Count) break; if (i + dupeLength + dupeLength > values.Count) break; var patternA = values.GetRange(i, dupeLength); var patternB = values.GetRange(i + dupeLength, dupeLength); bool isPattern = ComparePatterns(patternA, patternB); if (isPattern) { values.RemoveRange(i, dupeLength); } } } private static bool ComparePatterns(List<string> pattern, List<string> candidate) { for (int i = 0; i < pattern.Count; i++) { if (pattern[i] != candidate[i]) return false; } return true; } } 

fixed initial values โ€‹โ€‹corresponding to the values โ€‹โ€‹of the questions

+1
source share

EDIT: some changes and new suggestions made

How about a sliding window ...

 REMOVE LENGTH 2: (no other length has other matches) //the lower case letters are the matches ABCBAbabaBBCbcbcbVbvBCbcbcAB __ABCBABABABBCBCBCBVBVBCBCBCAB REMOVE LENGTH 1 (duplicate characters): //* denote that a string was removed to prevent continual contraction //of the string, unless this is what you want. ABCBA*BbC*V*BC*AB _ABCBA*BBC*V*BC*AB RESULT: ABCBA*B*C*V*BC*AB == ABCBABCVBCAB 

This, of course, starts with length = 2, increase it to L / 2 and iterate down.

I also think of two other approaches:

  • digraph . Install a graphic status sign with data and turn it over a line, if a cycle is detected, you will have duplication. I'm not sure how easy it is to check the check for these loops ... maybe some kind of dynamic programming, so this might be equivalent to method 2 below. I will have to think about it even longer.
  • distance matrix - using the distance matrix in Levenstein, you can detect duplication from diagonal movement (diagonally) with a cost of 0. This may indicate duplication of data. I will have to think more about this.
+2
source share

I would drop them all into your chosen implementation.

EDIT: Now that I understand the question, your original solution looks like the best way to do this. Just go through the array once, saving an array of flags to mark which items to keep, plus a counter to keep track of the size of the new array. Then repeat the loop to copy all the keepers to the new array.

+1
source share

I agree that if you can just flush lines in Set, then this might be the easiest solution.

If you donโ€™t have access to the Set implementation for any reason, I just sort the lines alphabetically and then go through once and delete the duplicates. How to sort and remove duplicates from the list will depend on what language and environment you use your code in.

EDIT: Oh, ick .... I see, based on your clarification, that you expect patterns to occur even on separate lines. My approach will not solve your problem. I'm sorry. There you have a question. If I had the following file.

but

but

b

from

from

but

but

b

from

from

Do you expect to simplify it to

but

b

from

0
source share

All Articles