Why do the following two duplicate search algorithms have different time complexity?

Question

Why do the following two duplicate search algorithms have different time complexity?

I read this question . The selected answer contains the following two algorithms. I could not understand why the first one-time complexity is O (ln (n)). In the worst case, if the array does not contain any duplicates, it will be cyclically n times, and the second. Am I mistaken or is something missing? Thanks you

1) Faster (marginal) way

It uses a hash approach. You have to pay for autoboxing, but O (ln (n)) instead of O (n2). The adventurous soul wanted to find a primitive set of hashes based on int (Apache or Google Collections has such a thing, understand.)

boolean duplicates(final int[] zipcodelist) { Set<Integer> lump = new HashSet<Integer>(); for (int i : zipcodelist) { if (lump.contains(i)) return true; lump.add(i); } return false; }

2) Worship HuyLe

See HuyLe's answer for a more or less O (n) solution, which, it seems to me, needs a few extra steps:

 static boolean duplicates(final int[] zipcodelist) { final int MAXZIP = 99999; boolean[] bitmap = new boolean[MAXZIP+1]; java.util.Arrays.fill(bitmap, false); for (int item : zipcodeList) if (!bitmap[item]) bitmap[item] = true; else return true; } return false; }

+2

algorithm time-complexity time duplicate-removal

WowBow Jun 23 2018-12-12T00:

source share

1 answer

nhahtdh · Accepted Answer · 2012-06-23 02:26

The first solution should assume O (n) complexity, since the entire list of postal codes must be passed, and the processing of each postal code is the expected O (1) time complexity.

Even though incorporating into a HashMap can cause a second hash, the complexity is still O (1) . This is a little illogical, since there is no connection between the Java HashMap and the assumption in the link, but it shows that this is possible.

From the HashSet documentation:

This class offers constant time for basic operations ( add , delete, and contains size) , assuming a hash function that correctly distributes items among buckets .

The same goes for the second solution that is parsed correctly: O (n).

(Only a non-topic note, BitSet is faster than an array, as shown in the original post, since 8 boolean packaged in 1 byte , which uses less memory).

Why do the following two duplicate search algorithms have different time complexity?

More articles: