Find three numbers appeared only once

In a sequence of length n, where n = 2k + 3, that is, there are k unique numbers that appear twice and three digits appear only once.

Question: how to find three unique numbers that appeared only once?

for example, in the sequence 1 1 2 6 3 6 5 7 7 three unique numbers: 2 3 5.

Note: 3 <= n <1e6, and the number will vary from 1 to 2e9

Memory limits: 1000 KB, which means that we cannot save the entire sequence.

The method I tried (memory limit exceeded):

I initialize the tree, and reading in the same number, I try to remove it from the tree, if remove returns false (not found), I add it to the tree. Finally, the tree has three numbers. It works, but the memory limit is exceeded.

I know how to find one or two of these numbers using bit manipulation. Therefore, I wonder if

can we find three using the same method (or some similar method)?

The search method for one / two numbers (numbers) appeared only once:

If one number appears only once, we can apply XOR to the sequence to find it.

If there are two of them, we can first apply XOR to the sequence, then divide the sequence into 2 parts into one bit of the result, which is 1, and again apply XOR to 2 parts, and we will find the answer.

+15
algorithm bit-manipulation sequence
Jun 09 2018-10-06T00:
source share
6 answers

You can do this similarly to the simpler cases of one and two different values.

We need two integers for each bit of numbers (for example, 32 bits). For each number, if this bit is zero, XOR is the first integer with it. If it is not, XOR is the second whole with him.

Also consider how many times you find 1 or 0 in each position (we only need to check if it is even or odd, so keep a logical value).

After iteration, our pairs of integers will be one of the following. The first number here is an even number, the second is odd.

0, a^b^c a^b, c a^c, b b^c, a 

For each pair, check for an even number of integers. If it is zero, then we know that the other integer is a ^ b ^ c, since none of our results will be equal. Otherwise, we found the value in an odd count integer.

 public static int[] find3(int[] list) { int[][] xors = new int[32][2]; boolean[] counts = new boolean[32]; for (int curr : list) { for (int i = 0; i < 32; i++) { xors[i][(curr & (1 << i)) >> i] ^= curr; counts[i] ^= ((curr & (1 << i)) == (1 << i)); } } // this really shouldn't take so many lines int[] ret = new int[3]; int found = 0; for (int i = 0; i < 32; i++) { int oddCount = xors[i][counts[i] ? 1 : 0]; int evenCount = xors[i][counts[i] ? 0 : 1]; if (evenCount != 0) { // avoid the 0, a^b^c case. if (found == 0) { ret[0] = oddCount;// a ret[2] = evenCount;// b^c for now found++; } else if (found == 1 && ret[0] != oddCount) { ret[1] = oddCount;// b ret[2] ^= oddCount;// (b^c)^b == c break; } } } return ret; } 
+7
Jun 09 '10 at 18:34
source share

For a more general version of this problem (without these silly restrictions):

You can do this in O (n) time and in O (1) space without any restrictions or iteration over all bits and using only O (1) time bit manipulations, such as the XOR trick that worked on 2 missing numbers .

Here is the (pseudo) code to find only one of the numbers:

 // Given an array arr with 2k+3 numbers, k of which are repeated twice // and the remaining three are distinct: a,b,c. // returns one of a,b,c. int FindUnique(int []arr) { int s = 0; // This will ultimately hold a ^ b ^ c (bitwise XOR) for (int i = 0; i < arr.Length; i++) { s ^= arr[i]; } int d = 0; // this holds diff(a,s) ^ diff(b,s) ^ diff(c,s) for (int i = 0; i < arr.Length; i++) { d ^= diff(arr[i],s); } int e = lowestBit(d); // This gives the position where one of a,b,c differs // from the others. int bucket1 = 0; int bucket2 = 0; for (int i = 0; i < arr.Length; i++) { if (arr[i] & e) { bucket1 ^= arr[i]; } else { bucket2 ^= arr[i]; } } int count1 = 0; int count2 = 0; for (int i = 0; i < arr.Length; i++) { if (arr[i] == bucket1) { count1++; } if (arr[i] == bucket2) { count2++; } } if (count1 == 1) return bucket1; return bucket2; } // return a number with the lowest bit of x ^ s set to 1 and rest 0. // ie the lowest bit position where x and s differ. int diff(int x, int s) { return lowestBit(x ^ s); } // Returns a number with only the lowest bit of y set. int lowestBit(int y) { return y & ~(y-1); } 

The idea is this:

Say the numbers that appear once: a, b, c.

Now run XOR through the array to get s = a XOR b XOR c.

Since the numbers are different, note that s cannot be a or b or c (since the other two will be equal then), so there is at least one bit (not necessarily in the same position), where each of a, b , c is different from s.

In the case of two numbers, we could see that s is non-zero and selects the bit that distinguished a and b and worked with it.

We encounter difficulties when we have three numbers, but we can still find a little to distinguish one of the numbers.

For each number x, find the least significant bit that is different from s. Consider a binary number in which only this bit is set to one, and the rest are zero. Name this number diff (x).

Now, if we calculate diff (x) for each number and XOR them together, we get d = diff (a) XOR diff (b) XOR diff (c).

Note that d cannot be null.

Now find the least significant bit of bit d. This bit position can be used to unload one of a, b, c, since not all of a, b, c can have the same bit in this position: if they did, then this bit s, which is the XOR of these three should be the same, but we guaranteed that we chose this bit s to be different from at least one of the corresponding bits in a, b, c.

So, we again XOR, differentiate on this bit and check which of the two resulting numbers appears exactly once in the array. Once we find one number, we know how to deal with two numbers.

To find diff, just use bithack: x & ~(x-1) , which is a standard bit hack and can be considered O (1) (instead of O (number of bits)).

+8
Jun 09 2018-10-06T00:
source share

This is a classic question - in fact I was asked a few weeks ago. To solve this problem, you take the number of possible different numbers that may appear, and select that many bits.

For example, if the numbers in the list should be between 1-20, you allocate 20 bits - one for each number, and you initialize each bit as 0.

Then you go through the list. Each time you see a number, flip the corresponding bit.

For example: in the list of examples from 2 6 3 6 5 7 7 we could select 7 bits (for 1 2 3 4 5 6 7). Then, going through the list, we will do the following:

  • flip 2nd bit
  • flip 6th bit
  • flip 3rd bit
  • flip 6th bit
  • etc.

Then, as soon as you finish looping through the list, you can read a bit to find three unique numbers. All of them will be represented by "1" bits, and the remaining numbers will be represented by 0s.

You read the list twice, which takes 2n time, which is O (n).




Edit: It is possible that no boundaries will be set. So one solution is to just read the list first to determine the boundaries yourself - then it's still O (n).

However, one of the problems that may arise is that the list can be very small, but some very large numbers effectively make the range too large. For example:

 1, 99999999999999999, 1, 99999999999999999, 2, 3, 4 

Solving this problem will require a lot of memory due to the large number of people on the list, because although there are very few, the range is very large, and we select the bits according to the range.

Then the solution could be adjusted to give a new solution as follows using a hash table (although I'm not sure if this is allowed if the condition is “only bit manipulation”):

  • Let L denote the original list, and C denote its copy.
  • Remove all duplicates from C (there are many ways to do this efficiently).
  • Create a hash table H and for each element in C insert the key / value pair < number , pos > in H , where number is the current element in C and pos is its position in C So, given the number that appears in L , we can now use H to find this number in C
  • Select the number of bits equal to the size of C , and initialize these bits to 0.
  • Traverse L Each time we run the accross number, we get its value from H and flip this bit in our bit list.
  • Move the list of bits - for each "1" bit, get a number from C , which is in this position - this is one of the unique numbers.
+7
Jun 09 2018-10-06T00:
source share

If a probabilistic solution is enough, you can use the Bloom Filter .

Create two Bloom filters. The first (A) contains numbers that were found at least one, and the second (B) contains numbers that were found twice.

pseudo code:

 A = empty B = empty foreach x in the list if x in A add x to B else add x to A foreach x in the list if x in A if !(x in B) print x 

If you use the full 1000K, the chance of error will be ridiculously low.

+6
Jun 09 '10 at 7:55
source share

The problem is getting harder and harder as you add more unique values, mainly because you can choose A, B, C so that A xor B xor C = 0. It is harder and harder to determine if a subset of values ​​has the same checksum because it contains all unique values ​​or because it did not indicate values ​​that occurred with xor equal to 0.

You can make 3 values ​​in constant space and time O (n * k), where k is the number of bits in the largest integer. (So ​​O (n) time for your typical case: 32-bit integers.)

It would be interesting to know if the timeline will be non-linear in N as the number of unique values ​​increases, and you continue to require constant space.

 //Special check for 0, because otherwise we don't know A xor B xor C != A xor B if items unique-contains 0 then return 0 ++ SubProblem2Unique(items - 0) //Compute A xor B xor C val x = fold xor items //Try to find a split which separates A and B from C. for i in 0..WORD_SIZE //see if the checksum splits val x1 = fold xor [e in items where e & (1<<i) == 0] val x2 = x xor x1 if x1 == x or x2 == x then continue //ith bit was the same for A and B and C //C is either x1 or x2 val C = if items unique-contains x1 then x1 else x2 return C ++ SubProblem2Unique(items - C) throw InvalidInput 
+1
Jun 09 '10 at 17:48
source share

Why not use a hashset? - If the number already exists, remove from hashset - if the number does not exist, enter the final hashset in the hashset containing only unique numbers. Time: O (n) Memory: o (k), where k is the number of different elements.

Using the hashset approach, the solution is scalable and can be used to determine any number of unique elements in any given sequence.

0
Jul 01 '10 at 20:59
source share



All Articles