I have a sequence of lines - 0000001, 0000002, 0000003.... up to 2 million. They are not adjacent. This means that there are gaps. Let's say after 0000003 the next line could be 0000006. I need to find out all these spaces. In the above case (0000004, 0000005).
This is what I have done so far -
gaps = list() total = len(curr_ids) for i in range(total): tmp_id = '%s' %(str(i).zfill(7)) if tmp_id in curr_ids: continue else: gaps.append(tmp_id) return gaps
But, you guessed it, this is slow since I am using list . If I use dict to pre-populate curr_ids, it will be faster. But what is the difficulty of populating a hash table? What is the fastest way to do this.
python
Srikar appalaraju
source share