For example, start_row of 4 will execute in 1 s, but start_row 500004 will take 11 seconds
It islice to be smart. Or lazy, depending on which term you prefer.
Thing is, files are just strings of bytes on your hard drive. They have no internal organization. \n is another set of bytes in this long long string. It is not possible to access any particular line without looking at all the information in front of it (unless your lines are the same length, in which case you can use file.seek ).
Line 4? The search for line 4 is fast, your computer just needs to find 3 \n . Line 50004? Your computer should read the file until it finds 500003 \n . In no case, and if someone tells you otherwise, they either have some other quantum computer, or their computer reads a file just like any other computer in the world, just behind them.
What you can do about this: try to be smart when trying to grab lines for iterating over. Smart and lazy. Organize your queries so that you only iterate through the file once, and close the file as soon as you pulled the necessary data. (islice does all this, by the way.)
In python
lines_I_want = [(start1, stop1), (start2, stop2),...] with f as open(filename): for i,j in enumerate(f): if i >= lines_I_want[0][0]: if i >= lines_I_want[0][1]: lines_I_want.pop(0) if not lines_I_want: #list is empty break else: #j is a line I want. Do something
And if you have control over the creation of this file, make each line the same length so you can seek . Or use a database.
NightShadeQueen
source share