Colin Pitrat's solution is very good, but I think you can use it when using sortedcontainers.SortedSet
I created a file with 1,000,000 random numbers (using numpy.random.randint) and checked that there is at least one number (in the middle of the file) that satisfies your condition.
Compare both approaches:
import filecmp from sortedcontainers import SortedSet from timeit import Timer def load2list(): with open('data', 'r') as f: return [int(line) for line in f] def save_result(i, fn): with open(fn, 'a') as f: print(i, file=f) def test_Colin_Pitrat(): j = 0 for i in range(0, len(l)): while l[j] < 2*l[i]: j += 1 if j == len(l): return -1 if l[j] == 2*l[i]: save_result(l[i], 'Colin_Pitrat.out') return l[i] def test_SortedSet(): for i in sorted_set: if i<<1 in sorted_set: save_result(i, 'SortedSet.out') return i return -1 if __name__=='__main__': timeit_number = 10000 l = load2list() sorted_set = SortedSet(l) print('len(l):\t\t%d' % (len(l))) print('len(sorted_set):\t%d' % (len(sorted_set))) print('Timeit results with %d executions:' %timeit_number) print('Colin_Pitrat:\t', Timer(test_Colin_Pitrat).timeit(timeit_number)) print('SortedSet:\t', Timer(test_SortedSet).timeit(timeit_number)) print("filecmp.cmp('Colin_Pitrat.out', 'SortedSet.out'):\t%s" % (filecmp.cmp('Colin_Pitrat.out', 'SortedSet.out')))
Output:
len(l): 1000001 len(sorted_set): 999504 Timeit results with 10000 executions: Colin_Pitrat: 35.94529931032006 SortedSet: 2.548847197918647 filecmp.cmp('Colin_Pitrat.out', 'SortedSet.out'): True
PS, as you can see, SortedSet is very fast.
UPDATE: (now I am testing at home, where my computer is much slower , so I will reduce the number of executions to 1.000)
As Colin Pitrat said, I am now generating data (about 100,000 numbers) for the worst case scenario - when no match is found. Now I will compare three functions: test_Colin_Pitrat, test_Colin_Pitrat2 (customized version), test_SortedSet ...
Script data generator:
import numpy as np l = np.random.randint(10**7, 10**9, 200000) l = l[ l % 2 > 0 ] np.savetxt('data', np.sort(l), fmt='%d')
code:
import filecmp from sortedcontainers import SortedSet from timeit import Timer def load2list(): with open('data', 'r') as f: return [int(line) for line in f] def save_result(i, fn): with open(fn, 'a') as f: print(i, file=f) def test_Colin_Pitrat(): j = 0 for i in range(0, len(l)): while l[j] < 2*l[i]: j += 1 if j == len(l): return -1 if l[j] == 2*l[i]: return l[i] def test_Colin_Pitrat2(): j = 0 s = len(l) for i in range(0, s): l_i = l[i] l_i2 = l_i<<1 while l[j] < l_i2: j += 1 if j == s: return -1 if l[j] == l_i2: return l[j] def test_SortedSet(): for i in sorted_set: if i<<1 in sorted_set: return i return -1 if __name__=='__main__': timeit_number = 1000 l = load2list() sorted_set = SortedSet(l) print('len(l):\t\t%d' % (len(l))) print('len(sorted_set):\t%d' % (len(sorted_set))) print('Timeit results for %d executions:' %timeit_number) print('Colin_Pitrat:\t', Timer(test_Colin_Pitrat).timeit(timeit_number)) print('Colin_Pitrat2:\t', Timer(test_Colin_Pitrat2).timeit(timeit_number)) print('SortedSet:\t', Timer(test_SortedSet).timeit(timeit_number))
Output:
len(l): 99909 len(sorted_set): 99899 Timeit results for 1000 executions: Colin_Pitrat: 153.04753258882357 Colin_Pitrat2: 103.68264272815443 SortedSet: 99.59669211136577
Conclusion: Colin_Pitrat2 is 33% faster compared to Colin_Pitrat and almost as fast as SortedSet.