The complexity of a randomized search algorithm

Consider the following randomized search algorithm on a sorted array a length n (in ascending order). x can be any element of the array.

 size_t randomized_search(value_t a[], size_t n, value_t x) size_t l = 0; size_t r = n - 1; while (true) { size_t j = rand_between(l, r); if (a[j] == x) return j; if (a[j] < x) l = j + 1; if (a[j] > x) r = j - 1; } } 

What is the expected value of the Big Theta complex (limited both below and above) for this function when x is randomly selected from a ?

Although this seems to be log(n) , I conducted an experiment with command counting and found that the result grows a little faster than log(n) (according to my data, even (log(n))^1.1 better matches the result).

Someone told me that this algorithm has great theta complexity accurate (therefore, obviously, log(n)^1.1 not the answer). So, could you talk about time complexity and your approach to prove this? Thanks.


Update: data from my experiment

log(n) math result:

log (n)

log(n)^1.1 matching result:

log (n) ^ 1.1

+7
source share
2 answers

If you want to switch to counting tripartite comparisons, I can tell you the exact complexity.

Suppose the key is at position i, and I want to know the expected number of comparisons to position j. I affirm that position j is checked if and only if it checks the first position between i and j inclusively. Since the rotation element is chosen uniformly random each time, this happens with a probability of 1 / (| i - j | + 1).

The overall complexity is the expectation over i <- {1, ..., n} of the sum_ {j = 1} ^ n 1 / (| i - j | + 1), which is

  sum_{i=1}^n 1/n sum_{j=1}^n 1/(|i - j| + 1) = 1/n sum_{i=1}^n (sum_{j=1}^i 1/(i - j + 1) + sum_{j=i+1}^n 1/(j - i + 1)) = 1/n sum_{i=1}^n (H(i) + H(n + 1 - i) - 1) = 1/n sum_{i=1}^n H(i) + 1/n sum_{i=1}^n H(n + 1 - i) - 1 = 1/n sum_{i=1}^n H(i) + 1/n sum_{k=1}^n H(k) - 1 (k = n + 1 - i) = 2 H(n + 1) - 3 + 2 H(n + 1)/n - 2/n = 2 H(n + 1) - 3 + O(log n / n) = 2 log n + O(1) = Theta(log n). 

(log stands for natural logging here.) Note -3 in the lower members. This makes it look like the number of comparisons is growing faster than the logarithmic at the beginning, but its level dictates the asymptotic behavior. Try to eliminate small n and redo the curves.

+5
source

Assuming rand_between implement a sample from a uniform probability distribution in constant time, the expected runtime of this algorithm is Θ (log n). Unofficial sketch of evidence: the expected value of rand_between(l, r) is (l+r)/2 , the middle between them. Thus, each iteration is expected to miss half of the array (provided that the size is equal to the power of two), as well as one iteration of the binary search.

More formally, borrowing from the quickselect analysis, note that when you select a random midpoint, half the time it will be between ¼n and ¾n. Neither the left nor the right subarrays have more than ¾n elements. The other half of the time does not have more than n elements (obviously). This leads to a recurrence relation

 T(n) = ½T(¾n) + ½T(n) + f(n) 

where f (n) is the amount of work at each iteration. Subtracting ½T (n) on both sides, then doubling both sides, we have

 ½T(n) = ½T(¾n) + f(n) T(n) = T(¾n) + 2f(n) 

Now, since 2f (n) = Θ (1) = Θ (n ᶜ log⁰ n), where c = log (1) = 0, by the theorem of the master it follows that T (n) = Θ (n⁰ log n) = Θ (log n).

+2
source

All Articles