Find duplicate sequence in row

I would like to find a repeating sequence in a string in VB.Net, something like:

Dim test as String = "EDCRFVTGBEDCRFVTGBEDCRFVTGBEDCRFVTGBEDCRFVTGBEDCRFVTGBEDCRFVTGB"

I want the program to detect a repeating sequence, in the case of EDCRFVTGB, and count how many times it repeats. My problem is to find a repeating sequence in a string, I was looking for several ways to do this, but I did not get a solution, I tried quick sorting algorithms that duplicate algorithms, but some of them do not work with strings.

Although I create substrings and check their existence in a string, I don’t know how to get a substring, since there is no pattern in a string, there is also the possibility of repeated sequence in a string.

+4
source share
4 answers

First check if half of the target line is repeated twice. If not, check if the third line is repeated three times. If not, check to see if one fourth row is repeated four times. Do this until you find the appropriate sequence. Skip any divisor where the quotient is not an integer to make it more efficient. This code should do the trick and fill in any spaces that this description cannot clarify:

Public Function DetermineSequence(ByVal strTarget As String) As String

    Dim strSequence As String = String.Empty

    Dim intLengthOfTarget As Integer = strTarget.Length

    'Check for a valid Target string.
    If intLengthOfTarget > 2 Then

        'Try 1/2 of Target, 1/3 of Target, 1/4 of Target, etc until sequence is found.
        Dim intCursor As Integer = 2

        Do Until strSequence.Length > 0 OrElse intCursor = intLengthOfTarget

            'Don't even test the string if its length is not a divisor (to an Integer) of the length of the target String.
            If IsDividendDivisibleByDivisor(strTarget.Length, intCursor) Then

                'Get the possible sequence.
                Dim strPossibleSequence As String = strTarget.Substring(0, (intLengthOfTarget / intCursor))

                'See if this possible sequence actually is the repeated String.
                If IsPossibleSequenceRepeatedThroughoutTarget(strPossibleSequence, strTarget) Then

                    'The repeated sequence has been found.
                    strSequence = strPossibleSequence

                End If

            End If

            intCursor += 1

        Loop

    End If

    Return strSequence

End Function

Private Function IsDividendDivisibleByDivisor(ByVal intDividend As Integer, ByVal intDivisor As Integer) As Boolean

    Dim bolDividendIsDivisbleByDivisor As Boolean = False

    Dim intOutput As Integer

    If Integer.TryParse((intDividend / intDivisor), intOutput) Then

        bolDividendIsDivisbleByDivisor = True

    End If

    Return bolDividendIsDivisbleByDivisor

End Function

Private Function IsPossibleSequenceRepeatedThroughoutTarget(ByVal strPossibleSequence As String, ByVal strTarget As String) As Boolean

    Dim bolPossibleSequenceIsRepeatedThroughoutTarget As Boolean = False

    Dim intLengthOfTarget As Integer = strTarget.Length
    Dim intLengthOfPossibleSequence As Integer = strPossibleSequence.Length

    Dim bolIndicatorThatPossibleSequenceIsCertainlyNotRepeated As Boolean = False

    Dim intCursor As Integer = 1

    Do Until (intCursor * intLengthOfPossibleSequence) = strTarget.Length OrElse bolIndicatorThatPossibleSequenceIsCertainlyNotRepeated

        If strTarget.Substring((intCursor * intLengthOfPossibleSequence), intLengthOfPossibleSequence) <> strPossibleSequence Then

            bolIndicatorThatPossibleSequenceIsCertainlyNotRepeated = True

        End If

        intCursor += 1

    Loop

    If Not bolIndicatorThatPossibleSequenceIsCertainlyNotRepeated Then

        bolPossibleSequenceIsRepeatedThroughoutTarget = True

    End If

    Return bolPossibleSequenceIsRepeatedThroughoutTarget

End Function
+1
source

, sequence, . , .

enter image description here

Option Strict On
Option Explicit On
Option Infer Off
Public Class Form1
    Private Sub Button1_Click(sender As Object, e As EventArgs) Handles Button1.Click

        ListView1.Items.Clear()
        ListView1.Columns.Clear()
        ListView1.Columns.Add("Sequence")
        ListView1.Columns.Add("Indexes of occurrence")
        Dim sequences As List(Of Sequence) = DetectSequences("EDCRFVTGBEDCRFVTGBEDCRFVTGBEDCRFVTGBEDCRFVTGBEDCRFVTGBEDCRFVTGB")
        For Each s As Sequence In sequences
            Dim item As New ListViewItem(s.Sequence)
            item.Tag = s
            item.SubItems.Add(s.IndexesToString)
            ListView1.Items.Add(item)
        Next
        ListView1.AutoResizeColumns(ColumnHeaderAutoResizeStyle.HeaderSize)
    End Sub
    Function DetectSequences(s As String, Optional minLength As Integer = 5, Optional MaxLength As Integer = 8) As List(Of Sequence)
        Dim foundPatterns As New List(Of String)
        Dim foundSequences As New List(Of Sequence)
        Dim potentialPattern As String = String.Empty, potentialMatch As String = String.Empty
        For start As Integer = 0 To s.Length - 1
            For length As Integer = 1 To s.Length - start
                potentialPattern = s.Substring(start, length)
                If potentialPattern.Length < minLength Then Continue For
                If potentialPattern.Length > MaxLength Then Continue For
                If foundPatterns.IndexOf(potentialPattern) = -1 Then
                    foundPatterns.Add(potentialPattern)
                End If
            Next
        Next
        For Each pattern As String In foundPatterns
            Dim sequence As New Sequence With {.Sequence = pattern}
            For start As Integer = 0 To s.Length - pattern.Length
                Dim length As Integer = pattern.Length
                potentialMatch = s.Substring(start, length)
                If potentialMatch = pattern Then
                    sequence.Indexes.Add(start)
                End If
            Next
            If sequence.Indexes.Count > 1 Then foundSequences.Add(sequence)
        Next
        Return foundSequences
    End Function
    Public Class Sequence
        Public Sequence As String = ""
        Public Indexes As New List(Of Integer)
        Public Function IndexesToString() As String
            Dim sb As New System.Text.StringBuilder
            For i As Integer = 0 To Indexes.Count - 1
                If i = Indexes.Count - 1 Then
                    sb.Append(Indexes(i).ToString)
                Else
                    sb.Append(Indexes(i).ToString & ", ")
                End If
            Next
            Return sb.ToString
        End Function
    End Class
    Private Sub ListView1_SelectedIndexChanged(sender As Object, e As EventArgs) Handles ListView1.SelectedIndexChanged
        If ListView1.SelectedItems.Count = 0 Then Exit Sub
        RichTextBox1.Clear()
        RichTextBox1.Text = "EDCRFVTGBEDCRFVTGBEDCRFVTGBEDCRFVTGBEDCRFVTGBEDCRFVTGBEDCRFVTGB"
        Dim selectedSequence As Sequence = DirectCast(ListView1.SelectedItems(0).Tag, Sequence)
        For Each i As Integer In selectedSequence.Indexes
            RichTextBox1.SelectionStart = i
            RichTextBox1.SelectionLength = selectedSequence.Sequence.Length
            RichTextBox1.SelectionBackColor = Color.Red
        Next
    End Sub
End Class
+1

, ? , ?

:

for each character index i
    for each character index after that j
        compare substring(i, j-i) to substring(j, j-i)
        if equal, record as a found repeating substring

, , ( j) , .

(N-), ( " " ) , N-, .

0

, , . : , .

Java- ( ), , . BANANA = > A, N, AN, NA, ANA (1,3), , , ( - , ):

public List<String> getRepetitions(String string) {
   List<String> repetitions = new ArrayList<String>();
   Map<String, List<Integer>> rep = new HashMap<String, List<Integer>>(), repOld;
   // init rep, add start position of all single character length strings
   for (int i = 0; i < string.length(); i++) {
      String s = string.substring(i, i + 1); // startIndex inclusive, endIndex exclusive
      if (rep.containsKey(s)) {
         rep.get(s).add(new Integer(i));
      } else {
         List<Integer> l = new ArrayList<Integer>();
         l.add(new Integer(i));
         rep.put(l);
      }
   }
   // eliminate those with no repetitions and add the others to the solution
   for (Map.Entry<String, Integer> e : rep.entrySet()) {
      if (e.getValue().size() < 2) {
         rep.remove(e.getKey());
      } else {
         repetitions.add(e.getKey());
      }
   }
   for (int len = 1; rep.size() > 0; len++) {
      repOld = rep;
      rep = new HashMap<String, List<Integer>>();
      for (Map.EntrySet<String, List<Integer>> e : repOld.entrySet()) {
         for (Integer i : e.getValue()) { // for all start indices
            if (i.intValue() + len + 1 >= string.length())
               break;
            String s = e.getKey() + string.charAt(i.intValue() + len + 1);
            if (rep.containsKey(s)) {
               rep.get(s).add(i);
            } else {
               List<Integer> l = new ArrayList<Integer>();
               l.add(i);
               rep.put(l);
            }
         } 
      }
      // eliminate repetitions and add to solution
      for (Map.Entry<String, Integer> e : rep.entrySet()) {
         if (e.getValue().size() < 2) {
            rep.remove(e.getKey());
         } else {
            repetitions.add(e.getKey());
         }
      }
   }
   return repetitions; // ordered by length, so last = longest
}

BANANA:

  • rep = > B → [0], A → [1, 3, 5], N → [2, 4]
  • , 2 (B), (A, N)
  • add the following letter to the remaining spaces (create a new one rep): AN → [1, 3], NA → [2, 4]
  • Eliminate (-) and add (AN, NA)
  • repetition of step 3. and 4 .: ANA → [1, 3]
  • in the loop repwill become empty and the algorithm is completed
-1
source

All Articles