Java What is the best way to find the first duplicate character in a string.

Question

Java What is the best way to find the first duplicate character in a string.

I wrote below code to detect the first duplicate character in a string.

public static int detectDuplicate(String source) { boolean found = false; int index = -1; final long start = System.currentTimeMillis(); final int length = source.length(); for(int outerIndex = 0; outerIndex < length && !found; outerIndex++) { boolean shiftPointer = false; for(int innerIndex = outerIndex + 1; innerIndex < length && !shiftPointer; innerIndex++ ) { if ( source.charAt(outerIndex) == source.charAt(innerIndex)) { found = true; index = outerIndex; } else { shiftPointer = true; } } } System.out.println("Time taken --> " + (System.currentTimeMillis() - start) + " ms. for string of length --> " + source.length()); return index; }

I need help with two things:

What is the worst complexity of this algorithm? - My understanding is O (n).
Is this the best way to do this? Can someone provide a better solution (if any)?

Thanks, NN

+6

java algorithm

Niranjan Sep 06 '12 at 17:06

source share

7 answers

assylias · Answer 1 · 2012-09-06T17:14:54+0000

As mentioned by others, your algorithm is O (n ^ 2). Here is the O (N) algorithm, because HashSet # adds runs in constant time (the hash function correctly distributes elements among buckets). Please note that I initially made the hash size the maximum size to avoid resizing / renaming:

 public static int findDuplicate(String s) { char[] chars = s.toCharArray(); Set<Character> uniqueChars = new HashSet<Character> (chars.length, 1); for (int i = 0; i < chars.length; i++) { if (!uniqueChars.add(chars[i])) return i; } return -1; }

Note: this returns the index of the first duplicate (i.e. the index of the first character, which is a duplicate of the previous character). To return the index of the first occurrence of this symbol, you need to save the indices in Map<Character, Integer> ( Map#put also O (1) in this case):

 public static int findDuplicate(String s) { char[] chars = s.toCharArray(); Map<Character, Integer> uniqueChars = new HashMap<Character, Integer> (chars.length, 1); for (int i = 0; i < chars.length; i++) { Integer previousIndex = uniqueChars.put(chars[i], i); if (previousIndex != null) { return previousIndex; } } return -1; }

Tom anderson · Answer 2 · 2012-09-06T17:11:42+0000

This is O (n ** 2), not O (n). Consider the case of abcdefghijklmnopqrstuvwxyzz . outerIndex will vary from 0 to 25 before the procedure completes, and each time it grows, innerIndex will vary from outerIndex to 26.

To get to O (n), you need to make one pass over the list and make O (1) work in each position. Since the work done in each position is to check whether the symbol was before (and if so, where), this means that you need an O (1) map implementation. A hash table gives you this; as well as an array indexed by character code.

assylias shows how to do this with hashing , so here's how to do it with an array (just for laughs, really):

 public static int detectDuplicate(String source) { int[] firstOccurrence = new int[1 << Character.SIZE]; Arrays.fill(firstOccurrence, -1); for (int i = 0; i < source.length(); i++) { char ch = source.charAt(i); if (firstOccurrence[ch] != -1) return firstOccurrence[ch]; else firstOccurrence[ch] = i; } return -1; }

Qnan · Answer 3 · 2012-09-06T17:12:12+0000

The complexity is approximately O(M^2) , where M is the minimum between the string length and the size of the set of possible characters K

You can get it up to O(M) with O(K) memory by simply remembering the position in which you first encounter each unique character.

Niranjan · Answer 4 · 2012-09-08T04:01:29+0000

Ok, I found the logic below to reduce O(N^2) to O(N) .

 public static int detectDuplicate(String source) { int index = -1; boolean found = false; final long start = System.currentTimeMillis(); for(int i = 1; i <= source.length() && !found; i++) { if(source.charAt(i) == source.charAt(i-1)) { index = (i - 1); found = true; } } System.out.println("Time taken --> " + (System.currentTimeMillis() - start) + " ms. for string of length --> " + source.length()); return index; }

It also shows a performance improvement over my previous algorithm, which has 2 nested loops. It takes 130ms. to detect the first repeated character of 63million characters, where a duplicate character is present at the end.

I am not sure if this is the best solution. If someone finds the best, please share it.

Thanks,

Nn

Tyler durden · Answer 5 · 2012-10-25T17:33:55+0000

I can significantly improve your algorithm. This should be done as follows:

 StringBuffer source ... char charLast = source.charAt( source.len()-1 ); int xLastChar = source.len()-1; source.setCharAt( xLastChar, source.charAt( xLastChar - 1 ) ); int i = 1; while( true ){ if( source.charAt(i) == source.charAt(i-1) ) break; i += 1; } source.setCharAt( xLastChar, charLast ); if( i == xLastChar && source.charAt( xLastChar-1 ) != charLast ) return -1; return i;

For a large string, this algorithm is probably twice as fast as yours.

Ankita walia · Answer 6 · 2017-12-23T20:25:49+0000

You may try:

  public static char firstRecurringChar(String s) { char x=' '; System.out.println("STRING : "+s); for(int i =0;i<s.length();i++) { System.out.println("CHAR AT "+i+" = " +s.charAt(i)); System.out.println("Last index of CHAR AT "+i+" = " +s.lastIndexOf(s.charAt(i))); if(s.lastIndexOf(s.charAt(i)) >i){ x=s.charAt(i); break; } } return x; }

amoebe · Answer 7 · 2012-09-06T17:11:09+0000

O(1) Algorithm

Your solution is O (n ^ 2) due to two nested loops.

The fastest algorithm for this is O(1) (constant time):

 public static int detectDuplicate(String source) { boolean[] foundChars = new boolean[Character.MAX_VALUE+1]; for(int i = 0; i < source.length(); i++) { if(i >= Character.MAX_VALUE) return Character.MAX_VALUE; char currentChar = source.charAt(i); if(foundChars[currentChar]) return i; foundChars[currentChar] = true; } return -1; }

However, it is only fast in terms of big about.

Java What is the best way to find the first duplicate character in a string.

More articles: