Puzzle in the data structure

I came across this puzzle [ related to data structure ] question in a coding contest.

There is a planet of trees (real trees, not a tree structure!). It has billions or even trillions of trees. The king orders to find the median epoch (in years and whole) of all trees, using, for example, carbon dating. ( Method does not matter. ) Note. Median is the "average" in a sorted list of numbers.

Limitations:
1. The oldest tree, as you know, 2000 years old.
2. They have one machine that can store integers ranging from -infinity to + infinity.
3. But the number of such integers that can be stored in memory at the same time is 1 million.

therefore, as soon as you store 1 million integers to store the next, you must delete the already saved file.
So for some reason they have to keep track of the medians as they continue to count age-old trees.
How can they do this?

My approach
Use the external sort option to sort the ages in pieces and write them to a file.
Apply k-way merging [for pieces].
The problem with the above approach is that this requires two file scans.

I can think of another approach that uses the information. The oldest tree is known to be 2000 years old.
Can't we take a count array [ as range of ages of tree is fixed ]?

I want to know if there is a better approach?
Is there any method in which we do not need to store data in a file? [ where only main memory is sufficient? ]

+7
source share
2 answers

You can do this by storing the integers of 2001. Create an array of different ages

 ages[2001] // [0..2000] 

when counting a tree

 ages[thisAge]++ 

Then the calculation of the median is trivial. You seem to have attacked this decision in the second approach you mentioned, but then you say I want to know if there is a better approach?

Is there any method in which we do not need to store data in a file? [where is only the main memory enough?]

I do not understand why you are asking if there is any method in which the main memory is sufficient. Does the array contain an integer 2001 in main memory?

Using the above approach, you can fill out your array of samples, and then calculate the median, iterating through the calculations, keeping the total amount as you go. When your total reaches half the total number of trees, you have a median. This requires one pass through all the trees for counting, plus a pass through part of the count array of some number <= 2001. Thus, this is O (n). You could instead track the median with this array as you go, but that would not improve the solution.

+8
source

The approach you recommend (array from 2001) is O (n), with one quick operation on the tree, which is optimal.

Good, almost optimal. At some point during the count, the number of remaining trees will not be enough to change the result. For example, if I count half + 1 trees, and everyone is exactly 100 years old, then I have the answer: 100 years.

But if the trees are well scattered by age, then the number of trees needed will be close to the total.

+2
source

All Articles