Why is this the smallest amount of memory:
The proposed solution has two phases:
Count the number of integers in each block
In this case, 4*(#blocks) bytes of memory are used.
Use a bit vector, each bit represents an integer in a block.
In this case, (blocksize/8) bytes of memory are used, which is (N/blocks)/8 .
Setting it to 2 results in blocks = sqrt(N/32) , as you already mentioned.
This is optimal because the required memory is the maximum required memory in each phase (which must be executed). After the 1st stage, you can forget the counters, except for which block to search for phase 2.
Optimization
If your counter is saturated when it reaches capacity, you really don't need 4 bytes per counter, but only 3 bytes. The counter reaches capacity when it exceeds the number of integers in the block.
In this case, phase 1 uses 3*blocks memory, and in phase 2 uses (N/blocks)/8 . Therefore, the optimum is blocks = sqrt(N/24) . If N is 4 billion, the number of blocks is approximately 12910, and the block size is 309838 integers per block. This corresponds to 3 bytes.
Cautions and an alternative with good average business performance.
The algorithm you proposed only works if all input integers are different. If the integers are not different from each other, I suggest you just go to a randomized set of candidates with integers. In a randomized set of candidates with integers, you can select "1000 potential integers" randomly and check if any files are found in the input file. If you fail, you can try finding another random set of integer numbers of candidates. Although this has poor performance in the worst case, in most cases it will be faster in most cases. For example, if input integers have a coverage of 99% of the possible integers, then on average, with 1000 integers, 10 of them will not be found. You can choose the number of candidates in a pseudo-random order so that you never repeat the integer number of candidates, and also to ensure that in a fixed number of attempts you check all possible integers.
If you check the integer numbers of sqrt(N) candidates each time, then the worst case performance may be as good as N*sqrt(N) , because you may have to scan all N integers of sqrt(N) .
You can avoid the worst case if you use this alternative, and if it does not work for the first set of candidate integers, you switch to the proposed solution. This can provide better mid-case performance (this is a common sorting strategy that uses quicksort before switching to heapsort, for example, if it turns out that the worst input is present).