Python implementation of the median median algorithm

I wrote this implementation of the median of the median algorithm in python, but it doesn't seem to give the correct result, and it also doesn't seem like linear complexity to me, any idea where I went next?

def select(L): if len(L) < 10: L.sort() return L[int(len(L)/2)] S = [] lIndex = 0 while lIndex+5 < len(L)-1: S.append(L[lIndex:lIndex+5]) lIndex += 5 S.append(L[lIndex:]) Meds = [] for subList in S: print(subList) Meds.append(select(subList)) L2 = select(Meds) L1 = L3 = [] for i in L: if i < L2: L1.append(i) if i > L2: L3.append(i) if len(L) < len(L1): return select(L1) elif len(L) > len(L1) + 1: return select(L3) else: return L2 

The function is called like this:

 L = list(range(100)) shuffle(L) print(select(L)) 

LE: Sorry. GetMed was a function that just sorted the list and returned the item to len (list), it should have been selected there, I fixed it now, but still I get the wrong results. Regarding the indentation, the code works without errors, and I see nothing wrong with that: - ??

LE2: I expect 50 (for the current L), it gives me outputs from 30 to 70, no more no less (for now)

LE3: Thank you very much for doing the trick that it is currently working. I'm confused though, I'm trying to make a comparison between this method and the naive one, where I just sort the array and output the results. Now, from what I have read so far, the temporary complexity of the select method should be O (n) Deterministic choice . Although I probably can't compete with python optimization developers, I expected closer results than I got, for example, if I changed the range of the list to 10,000,000, select the output in 84.10837116255952 seconds, while the sort and return method does this at 18.92556029528825. What are some good ways to make this algorithm faster?

+2
python algorithm
May 29 '12 at
source share
2 answers

1) Your code indentation is incorrect, try the following:

 def select(L): if len(L) < 10: L.sort() return L[int(len(L)/2)] S = [] lIndex = 0 while lIndex+5 < len(L)-1: S.append(L[lIndex:lIndex+5]) lIndex += 5 S.append(L[lIndex:]) Meds = [] for subList in S: print(subList) Meds.append(select(subList)) L2 = select(Meds) L1 = L3 = [] for i in L: if i < L2: L1.append(i) if i > L2: L3.append(i) if len(L) < len(L1): return select(L1) elif len(L) > len(L1) + 1: return select(L3) else: return L2 

2) The method you use does not return the median, it simply returns a number that is not so far from the median. To get the median value, you need to calculate how many numbers are larger than your pseudo-median, if most are larger, repeat the algorithm with numbers larger than the pseudo-median, otherwise repeat with other numbers.

 def select(L, j): if len(L) < 10: L.sort() return L[j] S = [] lIndex = 0 while lIndex+5 < len(L)-1: S.append(L[lIndex:lIndex+5]) lIndex += 5 S.append(L[lIndex:]) Meds = [] for subList in S: Meds.append(select(subList, int((len(subList)-1)/2))) med = select(Meds, int((len(Meds)-1)/2)) L1 = [] L2 = [] L3 = [] for i in L: if i < med: L1.append(i) elif i > med: L3.append(i) else: L2.append(i) if j < len(L1): return select(L1, j) elif j < len(L2) + len(L1): return L2[0] else: return select(L3, j-len(L1)-len(L2)) 

Warning: L = M = [] not L = [] and M = []

+2
May 29 '12 at 22:58
source share

Below is my implementation of PYTHON. For more speed, you can use PYPY instead.

For the question about SPEED: The theoretical speed for 5 numbers per column is ~ 10 N, so I use 15 numbers per column, for 2X at ~ 5 N, and the optimal speed is ~ 4 N. But I could be wrong about the optimal speed of the most modern solutions. In my own test, my program runs a little faster than using sort (). Of course, your mileage may vary.

Assuming the python program is "median.py", an example of its launch is "python./median.py 100". For a speed test, you can comment on the verification code and use PYPY.

 #!/bin/python # # TH @stackoverflow, 2016-01-20, linear time "median of medians" algorithm # import sys, random items_per_column = 15 def find_i_th_smallest( A, i ): t = len(A) if(t <= items_per_column): # if A is a small list with less than items_per_column items, then: # 1. do sort on A # 2. return the i-th smallest item of A # return sorted(A)[i] else: # 1. partition A into columns of items_per_column items each. items_per_column is odd, say 15. # 2. find the median of every column # 3. put all medians in a new list, say, B # B = [ find_i_th_smallest(k, (len(k) - 1)/2) for k in [A[j:(j + items_per_column)] for j in range(0,len(A),items_per_column)]] # 4. find M, the median of B # M = find_i_th_smallest(B, (len(B) - 1)/2) # 5. split A into 3 parts by M, { < M }, { == M }, and { > M } # 6. find which above set has A i-th smallest, recursively. # P1 = [ j for j in A if j < M ] if(i < len(P1)): return find_i_th_smallest( P1, i) P3 = [ j for j in A if j > M ] L3 = len(P3) if(i < (t - L3)): return M return find_i_th_smallest( P3, i - (t - L3)) # How many numbers should be randomly generated for testing? # number_of_numbers = int(sys.argv[1]) # create a list of random positive integers # L = [ random.randint(0, number_of_numbers) for i in range(0, number_of_numbers) ] # Show the original list # print L # This is for validation # print sorted(L)[int((len(L) - 1)/2)] # This is the result of the "median of medians" function. # Its result should be the same as the validation. # print find_i_th_smallest( L, (len(L) - 1) / 2) 
0
Jan 20 '16 at 22:04
source share



All Articles