Java array duplicate search

I have an array and I'm looking for duplicates.

duplicates = false; for(j = 0; j < zipcodeList.length; j++){ for(k = 0; k < zipcodeList.length; k++){ if (zipcodeList[k] == zipcodeList[j]){ duplicates = true; } } } 

However, this code does not work when there are no duplicates. What is it?

+45
java arrays
Oct 17 '10 at 1:15
source share
9 answers

Answer the nose.

 duplicates=false; for (j=0;j<zipcodeList.length;j++) for (k=j+1;k<zipcodeList.length;k++) if (k!=j && zipcodeList[k] == zipcodeList[j]) duplicates=true; 

Edited to switch .equals() to == , since I read somewhere you use int , which was not clear in the initial question. Also set k=j+1 to shorten the execution time, but still O (n 2 ).

The faster (within the limit) path

It uses a hash approach. You have to pay for autoboxing, but O (n) instead of O (n 2 ). The adventurous soul wanted to find a primitive set of hashes based on int (Apache or Google Collections has such a thing, understand.)

 boolean duplicates(final int[] zipcodelist) { Set<Integer> lump = new HashSet<Integer>(); for (int i : zipcodelist) { if (lump.contains(i)) return true; lump.add(i); } return false; } 

Worship HuyLe

See HuyLe's answer for a more or less O (n) solution, which, it seems to me, needs a few extra steps:

 static boolean duplicates(final int[] zipcodelist) { final int MAXZIP = 99999; boolean[] bitmap = new boolean[MAXZIP+1]; java.util.Arrays.fill(bitmap, false); for (int item : zipcodeList) if (!bitmap[item]) bitmap[item] = true; else return true; } return false; } 

Or just be compact

 static boolean duplicates(final int[] zipcodelist) { final int MAXZIP = 99999; boolean[] bitmap = new boolean[MAXZIP+1]; // Java guarantees init to false for (int item : zipcodeList) if (!(bitmap[item] ^= true)) return true; return false; } 

Does it mean that?

Ok, so I ran a little test, which, of course, is everywhere, but here is the code:

 import java.util.BitSet; class Yuk { static boolean duplicatesZero(final int[] zipcodelist) { boolean duplicates=false; for (int j=0;j<zipcodelist.length;j++) for (int k=j+1;k<zipcodelist.length;k++) if (k!=j && zipcodelist[k] == zipcodelist[j]) duplicates=true; return duplicates; } static boolean duplicatesOne(final int[] zipcodelist) { final int MAXZIP = 99999; boolean[] bitmap = new boolean[MAXZIP + 1]; java.util.Arrays.fill(bitmap, false); for (int item : zipcodelist) { if (!(bitmap[item] ^= true)) return true; } return false; } static boolean duplicatesTwo(final int[] zipcodelist) { final int MAXZIP = 99999; BitSet b = new BitSet(MAXZIP + 1); b.set(0, MAXZIP, false); for (int item : zipcodelist) { if (!b.get(item)) { b.set(item, true); } else return true; } return false; } enum ApproachT { NSQUARED, HASHSET, BITSET}; /** * @param args */ public static void main(String[] args) { ApproachT approach = ApproachT.BITSET; final int REPS = 100; final int MAXZIP = 99999; int[] sizes = new int[] { 10, 1000, 10000, 100000, 1000000 }; long[][] times = new long[sizes.length][REPS]; boolean tossme = false; for (int sizei = 0; sizei < sizes.length; sizei++) { System.err.println("Trial for zipcodelist size= "+sizes[sizei]); for (int rep = 0; rep < REPS; rep++) { int[] zipcodelist = new int[sizes[sizei]]; for (int i = 0; i < zipcodelist.length; i++) { zipcodelist[i] = (int) (Math.random() * (MAXZIP + 1)); } long begin = System.currentTimeMillis(); switch (approach) { case NSQUARED : tossme ^= (duplicatesZero(zipcodelist)); break; case HASHSET : tossme ^= (duplicatesOne(zipcodelist)); break; case BITSET : tossme ^= (duplicatesTwo(zipcodelist)); break; } long end = System.currentTimeMillis(); times[sizei][rep] = end - begin; } long avg = 0; for (int rep = 0; rep < REPS; rep++) { avg += times[sizei][rep]; } System.err.println("Size=" + sizes[sizei] + ", avg time = " + avg / (double)REPS + "ms"); } } } 

With NSQUARED:

 Trial for size= 10 Size=10, avg time = 0.0ms Trial for size= 1000 Size=1000, avg time = 0.0ms Trial for size= 10000 Size=10000, avg time = 100.0ms Trial for size= 100000 Size=100000, avg time = 9923.3ms 

With hashset

 Trial for zipcodelist size= 10 Size=10, avg time = 0.16ms Trial for zipcodelist size= 1000 Size=1000, avg time = 0.15ms Trial for zipcodelist size= 10000 Size=10000, avg time = 0.0ms Trial for zipcodelist size= 100000 Size=100000, avg time = 0.16ms Trial for zipcodelist size= 1000000 Size=1000000, avg time = 0.0ms 

With bitset

 Trial for zipcodelist size= 10 Size=10, avg time = 0.0ms Trial for zipcodelist size= 1000 Size=1000, avg time = 0.0ms Trial for zipcodelist size= 10000 Size=10000, avg time = 0.0ms Trial for zipcodelist size= 100000 Size=100000, avg time = 0.0ms Trial for zipcodelist size= 1000000 Size=1000000, avg time = 0.0ms 

BITSET Winner!

But just for the hair .... 15ms is within the error range for currentTimeMillis() , and there are some gaping holes in my test. Note that for any list longer than 100,000, you can simply return true because there will be a duplicate. In fact, if the list looks like random, you can return true WHP for a much shorter list. What is morality? In the limit, the most efficient implementation is:

  return true; 

And you will not be mistaken very often.

+114
Oct 17 '10 at 1:22
source share

See how your algorithm works:

 an array of unique values: [1, 2, 3] check 1 == 1. yes, there is duplicate, assigning duplicate to true. check 1 == 2. no, doing nothing. check 1 == 3. no, doing nothing. check 2 == 1. no, doing nothing. check 2 == 2. yes, there is duplicate, assigning duplicate to true. check 2 == 3. no, doing nothing. check 3 == 1. no, doing nothing. check 3 == 2. no, doing nothing. check 3 == 3. yes, there is duplicate, assigning duplicate to true. 

best algorithm:

 for (j=0;j<zipcodeList.length;j++) { for (k=j+1;k<zipcodeList.length;k++) { if (zipcodeList[k]==zipcodeList[j]){ // or use .equals() return true; } } } return false; 
+10
Oct 17 '10 at 1:20
source share

You can use a bitmap for better performance with a large array.

 java.util.Arrays.fill(bitmap, false); for (int item : zipcodeList) if (!bitmap[item]) bitmap[item] = true; else { duplicate = true; break; } 
+10
Oct 17 '10 at 1:52
source share

To check for duplicates, you need to compare the excellent pairs.

+4
Oct 17 '10 at 1:19
source share

Because you are comparing the first element of the array with yourself, so it detects that duplicates exist, even if they do not exist.

+2
Oct 17 '10 at 1:20
source share

Initialize k = j + 1. You will not compare items with yourself, nor will you duplicate comparisons. For example, j = 0, k = 1 and k = 0, j = 1 compare the same set of elements. This will remove the comparison k = 0, j = 1.

+2
Oct 17 '10 at 1:25
source share

Do not use == use .equals .

try this instead (IIRC, ZipCode needs to implement Comparable for this to work.

 boolean unique; Set<ZipCode> s = new TreeSet<ZipCode>(); for( ZipCode zc : zipcodelist ) unique||=s.add(zc); duplicates = !unique; 
+1
Oct 17 '10 at 1:21
source share

How about using this method?

 HashSet<Integer> zipcodeSet = new HashSet<Integer>(Arrays.asList(zipcodeList)); duplicates = zipcodeSet.size()!=zipcodeList.length; 
0
Apr 13 '16 at 5:23
source share

You can also work with Set, which does not allow duplication in Java.

  for (String name : names) { if (set.add(name) == false) { // your duplicate element } } 

using the add () method and checking the return value. If add () returns false, it means that the element is not allowed in Set, and this is your duplicate.

0
Aug 31 '16 at 13:48
source share



All Articles