Main update: Changed to use the correct code for reading in a file with a pre-processed array ( using_preprocessed_file() function below), which dramatically changed the results.
To determine which method works faster in Python (using only built-in modules and standard libraries), I created a script for comparing (via timeit ) various methods that could be used to do this. This is a bit on the long side, so in order to avoid distraction, I send only those tests and related results. (If you have enough interest in the methodology, I will post the entire script.)
The following are snippets of code that have been mapped:
@TESTCASE('Read and constuct piecemeal with struct') def read_file_piecemeal(): structures = [] with open(test_filenames[0], 'rb') as inp: size = fmt1.size while True: buffer = inp.read(size) if len(buffer) != size: # EOF? break structures.append(fmt1.unpack(buffer)) return structures @TESTCASE('Read all-at-once, then slice and struct') def read_entire_file(): offset, unpack, size = 0, fmt1.unpack, fmt1.size structures = [] with open(test_filenames[0], 'rb') as inp: buffer = inp.read() # read entire file while True: chunk = buffer[offset: offset+size] if len(chunk) != size: # EOF? break structures.append(unpack(chunk)) offset += size return structures @TESTCASE('Convert to array (@randomir part 1)') def convert_to_array(): data = array.array('d') record_size_in_bytes = 9*4 + 16*8 # 9 ints + 16 doubles (standard sizes) with open(test_filenames[0], 'rb') as fin: for record in iter(partial(fin.read, record_size_in_bytes), b''): values = struct.unpack("<2i5d2idi3d2i3didi3d", record) data.extend(values) return data @TESTCASE('Read array file (@randomir part 2)', setup='create_preprocessed_file') def using_preprocessed_file(): data = array.array('d') with open(test_filenames[1], 'rb') as fin: n = os.fstat(fin.fileno()).st_size // 8 data.fromfile(fin, n) return data def create_preprocessed_file(): """ Save array created by convert_to_array() into a separate test file. """ test_filename = test_filenames[1] if not os.path.isfile(test_filename): # doesn't already exist? data = convert_to_array() with open(test_filename, 'wb') as file: data.tofile(file)
And here are the results that they performed on my system:
Fastest to slowest execution speeds using Python 3.6.1 (10 executions, best of 3 repetitions) Size of structure: 164 Number of structures in test file: 40,000 file size: 6,560,000 bytes Read array file (@randomir part 2): 0.06430 secs, relative 1.00x ( 0.00% slower) Read all-at-once, then slice and struct: 0.39634 secs, relative 6.16x ( 516.36% slower) Read and constuct piecemeal with struct: 0.43283 secs, relative 6.73x ( 573.09% slower) Convert to array (@randomir part 1): 1.38310 secs, relative 21.51x (2050.87% slower)
Interestingly, most fragments are really faster in Python 2 ...
Fastest to slowest execution speeds using Python 2.7.13 (10 executions, best of 3 repetitions) Size of structure: 164 Number of structures in test file: 40,000 file size: 6,560,000 bytes Read array file (@randomir part 2): 0.03586 secs, relative 1.00x ( 0.00% slower) Read all-at-once, then slice and struct: 0.27871 secs, relative 7.77x ( 677.17% slower) Read and constuct piecemeal with struct: 0.40804 secs, relative 11.38x (1037.81% slower) Convert to array (@randomir part 1): 1.45830 secs, relative 40.66x (3966.41% slower)