The number of integers in the list that exceeds the specified integer may not be in the list in the log log

Question

The number of integers in the list that exceeds the specified integer may not be in the list in the log log

Given an unordered list of n non-negative integers (without any guarantees of distribution or repetition), I want to be able to get an integer, maybe not in the list, and respond with a number of integers, at least as large as the ones in the list. I have up to n ^ 2 preprocessing times and up to n * log (n) storage. Is it possible?

My not-so-good solution is binary search (log n time, constant space).

Saving a map of all possible queries on a map takes up too much space.

Editing. Partial solutions that require some assumptions on inputs, such as distribution or the maximum size of an integer, are also useful.

Edit: This is called the predecessor / successor problem. There is an article by Beame and Fich in which they build a data structure that stores n sets of integer elements from a universe of size N in O (n) space and executes queries from its predecessor in O (min {(log log N) / (log log log N), sqrt (log n / (log log n))}).

http://homes.cs.washington.edu/~beame/papers/stocpred.pdf

Edit - Bounty: None of the answers this morning exactly fits. N is not limited. Integers are not necessarily 32 bits. The largest element can be much larger than the number of elements. I do not assume any distribution at the inputs. From the existing answers, I took the Coffin for generosity, because it covers a relatively large set of problems, where I have a distribution.

+8

language-agnostic algorithm data-structures

Dave Oct 31 '15 at 2:06

source share

7 answers

Jerry Coffin · Answer 1 · 2015-11-01T05:39:21+0000

Assuming your elements are fairly evenly distributed (or at least follows some distribution fairly close), the obvious method would be to sort the data and then use interpolation search instead of binary search.

Interpolating search usually has gross complexity O (log log n).

Oh, if this is not obvious from the name, the basic idea of interpolation search is to use interpolation to guess the approximate location of the item you are looking for. If, for example, you are dealing with integers from 0 to 100,000, and you need to find, say, 21,000, you can start at about 21% of the path to the array, rather than starting at half. Then, based on the value you find there, you can use interpolation to find the best guess, etc.

This example is designed for linear interpolation, but the same basic idea applies to other distributions - you just need to use a function that works well for distributing data.

mcdowella · Answer 2 · 2015-10-31T05:40:27+0000

Depending on your parameters, you can try https://en.wikipedia.org/wiki/Van_Emde_Boas_tree - a "tree-like data structure that implements an associative array with m-bit integer keys. It performs all operations in O (log m) time or , which is the same in O (log log M), where M = 2 ^ m is the maximum number of elements that can be stored in the tree. " (note the warning at the top of the article, which says that there is an error in their pseudo-code)

גלעד ברקן · Answer 3 · 2015-10-31T22:33:25+0000

(Note: this response was sent before the OP removed its comment in templatetypedef: “As for integer size, we can assume that it is 32-bit unsigned.”)

Prepare a hash set pointing to 65,536 sorted buckets (technically this is O(1) extra space, although it can be said that the presence of about 5500 items in the list is a threshold above which the space will be less than n * log n indicated.), Each key represents a possible configuration of the remaining 16 bits from the allocated 32. Each bucket will store the number of elements above the current bucket, as well as the list values in this integer range and the number of duplicates if necessary.

When you insert, you must update all the values of the counters of the lower bucket; technically, O(1) update time, although it is clearly significant for smaller lists; but if the list is known in advance, as you suggested, the preprocessing time can be O(n * log n) by "reporting" the counters from top to bottom from bucket to bucket. The query will take O(1) to search for the bucket. A search in a bucket can take no more than log m , where m , the number of elements in the bucket, is less than or equal to 65536, the constant does not depend on n .

In pre-processing, depending on the range and distribution, two or three offsets can be used for further optimization.

גלעד ברקן · Answer 4 · 2015-11-05T06:15:21+0000

(Note: this response was sent before the OP removed its comment in templatetypedef: “As for integer size, we can assume that it is 32-bit unsigned.”)

Y-fast trie ( https://en.m.wikipedia.org/wiki/Y-fast_trie ), invented by Dan Willard, supports the kinds of operations and the time complexity you are looking for, It uses the O(n) and O(log log U) space O(log log U) asymptotic search time, where U is the largest value in the domain, which for our purposes may be the largest value in your list; that a regular binary search with more than 32 items in your list will already be asymptotically slower.

Y-fast trie is built from binary search n / log U trees that together contain the entire sorted list as a sequence; and one X-fast trie ( https://en.m.wikipedia.org/wiki/X-fast_trie ), which contains one representative for each of the binary search trees to find which tree to search in.

I will talk a little bit about what I learned (because I only learned a little) about the X-fast trie method for finding a successor / predecessor, an operation that you find interesting. Asymptotic time complexity for finding O(log log U) .

The search for the predecessor / successor k begins with a binary search through the levels of trie, which has a height of log U We start half way through trie - if the prefix k length corresponding to this level is not among the hashed nodes, the ancestor of k should be higher, and even lower.

Once an ancestor is discovered, we know that one subtree of this node has leaves (leaves are stored where trie values are stored), and the other, where k would be, is not. This is what the ingenious descendant pointer refers to, which points to the smallest leaf in the right subtree when the left subtree is missing, or the largest leaf on the left when the right is missing.

Now we are located directly at the predecessor or successor of k and can report the stored value of trie: which binary tree to search for. Asymptotic complexity of the space: O(n + n / log U * log U) = O(n) . Asymptotic time complexity: O(log log U + log log U) = O(log log U)

גלעד ברקן · Answer 5 · 2015-11-06T07:11:24+0000

(Note: this response was sent before the OP removed its comment in templatetypedef: “As for integer size, we can assume that it is 32-bit unsigned.”)

Y-fast trie (see my other answer) can lead us to O(log log U + log log U) search time, which means that if your range is in billions, we look at almost 5 + 5 = 10 iterations to search.

But there is a way to achieve practical search time for 5 iterations.

Hash all combinations of the leftmost 17 bits. Direct these 131,072 keys to X-fast attempts (see My other answer) with a maximum height of 15 and a maximum space of m * 15 , where m is the number of elements in this particular bucket. Attempts will only contain the most 15 bits of each corresponding list item. Since these X-fast attempts are limited in size, the search time will be increased by 1 + log 15 = 5 . If your list does not exceed 32,768 items, the space will be almost 131,072 + n * 15 , which is slightly larger than the requested n * log n ; but since the hash height and max three are constant, the asymptotic spatial complexity is actually O(n) , and with a list of 32,768 elements or more, the complexity of the space will be practically less than n * log n .

Here is a rough sketch in the JavaScript tree of X-fast:

 function pad(width, string, padding) { return (width <= string.length) ? string : pad(width, padding + string, padding); } function makeXFastTree(elems){ var xfast = {}; var height = Math.floor(Math.log2(Math.max.apply(null, elems))) + 1; function insert(x){ var y = pad(height,x.toString(2),'0'); var l = 1; var d = y.substr(-l,1); // add element to the parent node if (!xfast[y.substr(0,height - l)]){ xfast[y.substr(0,height - l)] = [y,y]; } else if (d == '1'){ xfast[y.substr(0,height - l)][1] = y; } else { xfast[y.substr(0,height - l)][0] = y; } // update higher nodes l++; d = y.substr(-l,1); var temp = y.substr(0,height - l); while (temp.length > 0){ if (!xfast[temp]){ xfast[temp] = d == 0 ? ['0',y] : [y,'1']; } else if (d == '0'){ xfast[temp][0] = '0'; if (xfast[temp][1] != '1' && y > xfast[temp][1]){ xfast[temp][1] = y; } } else { xfast[temp][1] = '1'; if (xfast[temp][0] != '0' && y < xfast[temp][0]){ xfast[temp][0] = y; } } l++; d = y.substr(-l,1); temp = y.substr(0,height - l); } } for (var i=0; i<elems.length; i++){ insert(elems[i]); } return [xfast,height]; } function find(T,height,x){ var y = pad(height,x.toString(2),'0'); var l = d = height >> 1; var temp = y.substr(0,l); while (true){ // ancestor found if (T[temp] && !T[y.substr(0,temp.length + 1)]){ return T[temp]; } d = Math.ceil(d/2); if (T[temp]){ l += d; temp = y.substr(0,l); } else { l -= d; temp = y.substr(0,l); } } }

Output:

 var t = makeXFastTree([31,27,10,5,4,2,1]); console.log(JSON.stringify(t[0])); {"0":["0","1"],"1":["11011","1"],"11":["0","1"],"110":["11011","1"],"111":["11111","1"] ,"1101":["11011","11011"],"1111":["11111","11111"],"0101":["01010","01010"] ,"010":["01010","1"],"01":["0","01010"],"0010":["00100","00101"],"001":["0","00101"] ,"00":["0","1"],"0001":["00010","00010"],"000":["0","1"],"0000":["00001","00001"]} console.log(find(t[0],t[1],28)); ["11111", "1"] console.log(find(t[0],t[1],3)); ["00010", "00010"]

m69 · Answer 6 · 2015-11-06T05:09:49+0000

If there is enough time and space for preprocessing to put the data in the tree, you can create a sorted tree in which each branch stores how many sheets are associated with the right (larger) side. When building a tree, this score can be increased for each branch that you pass to the right when you insert a new sheet, so it does not take much time. Getting the number of values greater than or equal to a certain integer can be done by finding the place of the integer in the tree and adding all the branch counts that you pass to the left of the path.

The complexity of the time will be the usual complexity such as a tree, plus several increments of the value per sheet during construction, and the space will be a regular space plus a counter for each sheet, the size of which depends on the maximum number of leaves.

In the sample code, I used a simple binary tree; you can use the remaining preprocessing time to level the height of the tree (make sure the counts are updated) or use some type of balancing tree (but this will probably be overly complicated).

Example code fragment in Javascript: (uses 100,000 random numbers, handles duplicate values and search values that are not in the tree correctly)

 function ChopTree() { this.root = null; this.insert = function(value) { var branch = null, leaf = this.root, before; while (leaf != null) { branch = leaf; before = value <= leaf.value; if (before) leaf = branch.left else { ++branch.count; leaf = branch.right; } } if (branch == null) this.root = new Leaf(value) else if (before) branch.left = new Leaf(value) else branch.right = new Leaf(value); } this.chop = function(axe) { var branch = this.root, count = 0; while (branch != null) { if (axe <= branch.value) { count += branch.count; branch = branch.left; } else branch = branch.right; } return count; } function Leaf(value) { this.value = value; this.left = null; this.right = null; this.count = 1; } } var t = new ChopTree(); for (var i = 0; i < 100000; i++) t.insert(Math.floor(Math.random() * 4294967296)); document.write("Inserted 100,000 random integers from 0 to 2<SUP>32</SUP><BR><BR>"); document.write(t.chop(0) + " greater than or equal to 0<BR>"); document.write(t.chop(2147483648) + " greater than or equal to 2<SUP>31</SUP><BR>"); document.write(t.chop(4000000000) + " greater than or equal to 4&times;10<SUP>9</SUP><BR>"); document.write(t.chop(4294967296) + " greater than or equal to 2<SUP>32</SUP><BR>");

Update: incrementing a counter value can be used to handle duplicate values if you expect many of them and spaces to be a problem.

Yves daoust · Answer 7 · 2015-11-09T20:25:38+0000

With an integer limited to 32 bits, the question doesn't make much sense. Indeed, if N is less than 2 ^ 32, then it is bounded and asymptotic difficulties do not make sense.

If N is unlimited, you can sort the values in a linear calculation of the time of their multiplicity. Thus, the number of elements is again limited, and the asymptotic difficulties no longer make sense.

There is something wrong with the problem statement.

The number of integers in the list that exceeds the specified integer may not be in the list in the log log

More articles: