Java What is the best way to find the first duplicate character in a string.

I wrote below code to detect the first duplicate character in a string.

public static int detectDuplicate(String source) { boolean found = false; int index = -1; final long start = System.currentTimeMillis(); final int length = source.length(); for(int outerIndex = 0; outerIndex < length && !found; outerIndex++) { boolean shiftPointer = false; for(int innerIndex = outerIndex + 1; innerIndex < length && !shiftPointer; innerIndex++ ) { if ( source.charAt(outerIndex) == source.charAt(innerIndex)) { found = true; index = outerIndex; } else { shiftPointer = true; } } } System.out.println("Time taken --> " + (System.currentTimeMillis() - start) + " ms. for string of length --> " + source.length()); return index; } 

I need help with two things:

  • What is the worst complexity of this algorithm? - My understanding is O (n).
  • Is this the best way to do this? Can someone provide a better solution (if any)?

Thanks, NN

+6
source share
7 answers

As mentioned by others, your algorithm is O (n ^ 2). Here is the O (N) algorithm, because HashSet # adds runs in constant time (the hash function correctly distributes elements among buckets). Please note that I initially made the hash size the maximum size to avoid resizing / renaming:

 public static int findDuplicate(String s) { char[] chars = s.toCharArray(); Set<Character> uniqueChars = new HashSet<Character> (chars.length, 1); for (int i = 0; i < chars.length; i++) { if (!uniqueChars.add(chars[i])) return i; } return -1; } 

Note: this returns the index of the first duplicate (i.e. the index of the first character, which is a duplicate of the previous character). To return the index of the first occurrence of this symbol, you need to save the indices in Map<Character, Integer> ( Map#put also O (1) in this case):

 public static int findDuplicate(String s) { char[] chars = s.toCharArray(); Map<Character, Integer> uniqueChars = new HashMap<Character, Integer> (chars.length, 1); for (int i = 0; i < chars.length; i++) { Integer previousIndex = uniqueChars.put(chars[i], i); if (previousIndex != null) { return previousIndex; } } return -1; } 
+12
source

This is O (n ** 2), not O (n). Consider the case of abcdefghijklmnopqrstuvwxyzz . outerIndex will vary from 0 to 25 before the procedure completes, and each time it grows, innerIndex will vary from outerIndex to 26.

To get to O (n), you need to make one pass over the list and make O (1) work in each position. Since the work done in each position is to check whether the symbol was before (and if so, where), this means that you need an O (1) map implementation. A hash table gives you this; as well as an array indexed by character code.

assylias shows how to do this with hashing , so here's how to do it with an array (just for laughs, really):

 public static int detectDuplicate(String source) { int[] firstOccurrence = new int[1 << Character.SIZE]; Arrays.fill(firstOccurrence, -1); for (int i = 0; i < source.length(); i++) { char ch = source.charAt(i); if (firstOccurrence[ch] != -1) return firstOccurrence[ch]; else firstOccurrence[ch] = i; } return -1; } 
+1
source

The complexity is approximately O(M^2) , where M is the minimum between the string length and the size of the set of possible characters K

You can get it up to O(M) with O(K) memory by simply remembering the position in which you first encounter each unique character.

+1
source

Ok, I found the logic below to reduce O(N^2) to O(N) .

 public static int detectDuplicate(String source) { int index = -1; boolean found = false; final long start = System.currentTimeMillis(); for(int i = 1; i <= source.length() && !found; i++) { if(source.charAt(i) == source.charAt(i-1)) { index = (i - 1); found = true; } } System.out.println("Time taken --> " + (System.currentTimeMillis() - start) + " ms. for string of length --> " + source.length()); return index; } 

It also shows a performance improvement over my previous algorithm, which has 2 nested loops. It takes 130ms. to detect the first repeated character of 63million characters, where a duplicate character is present at the end.

I am not sure if this is the best solution. If someone finds the best, please share it.

Thanks,

Nn

0
source

I can significantly improve your algorithm. This should be done as follows:

 StringBuffer source ... char charLast = source.charAt( source.len()-1 ); int xLastChar = source.len()-1; source.setCharAt( xLastChar, source.charAt( xLastChar - 1 ) ); int i = 1; while( true ){ if( source.charAt(i) == source.charAt(i-1) ) break; i += 1; } source.setCharAt( xLastChar, charLast ); if( i == xLastChar && source.charAt( xLastChar-1 ) != charLast ) return -1; return i; 

For a large string, this algorithm is probably twice as fast as yours.

0
source

You may try:

  public static char firstRecurringChar(String s) { char x=' '; System.out.println("STRING : "+s); for(int i =0;i<s.length();i++) { System.out.println("CHAR AT "+i+" = " +s.charAt(i)); System.out.println("Last index of CHAR AT "+i+" = " +s.lastIndexOf(s.charAt(i))); if(s.lastIndexOf(s.charAt(i)) >i){ x=s.charAt(i); break; } } return x; } 
0
source

O(1) Algorithm

Your solution is O (n ^ 2) due to two nested loops.

The fastest algorithm for this is O(1) (constant time):

 public static int detectDuplicate(String source) { boolean[] foundChars = new boolean[Character.MAX_VALUE+1]; for(int i = 0; i < source.length(); i++) { if(i >= Character.MAX_VALUE) return Character.MAX_VALUE; char currentChar = source.charAt(i); if(foundChars[currentChar]) return i; foundChars[currentChar] = true; } return -1; } 

However, it is only fast in terms of big about.

-1
source

Source: https://habr.com/ru/post/924775/


All Articles