It is possible, with a slight loss of efficiency, to perform a binary search in a sorted text file with records of unknown length, repeatedly dividing in half by a range and reading ahead through the line terminator. Here is what I am doing to look up through the csv file with 2 header lines for the number in the first field. Give it an open file and the first search box. This is pretty easy to change for your problem. A match on the very first line with a zero offset will cause a failure, so this may be required in a special way. In my circumstances, the first two lines are headers and are skipped.
Please excuse my lack of polished python below. I use this function and a similar function to perform GeoCity Lite latitude and longitude calculations directly from the CSV files distributed by Maxmind.
Hope this helps
==========================================
# See if the input loc is in file def look1(f,loc): # Compute filesize of open file sent to us hi = os.fstat(f.fileno()).st_size lo=0 lookfor=int(loc) # print "looking for: ",lookfor while hi-lo > 1: # Find midpoint and seek to it loc = int((hi+lo)/2) # print " hi = ",hi," lo = ",lo # print "seek to: ",loc f.seek(loc) # Skip to beginning of line while f.read(1) != '\n': pass # Now skip past lines that are headers while 1: # read line line = f.readline() # print "read_line: ", line # Crude csv parsing, remove quotes, and split on , row=line.replace('"',"") row=row.split(',') # Make sure 1st fields is numeric if row[0].isdigit(): break s=int(row[0]) if lookfor < s: # Split into lower half hi=loc continue if lookfor > s: # Split into higher half lo=loc continue return row # Found # If not found return False
user3101161
source share