The numpy genfromtxt and loadtxt would be quite difficult to use throughout the file, since your data has a very special structure (which varies depending on which node you are in). Therefore, I would suggest the following strategy:
Read line by line line by line, try to determine in which node you are analyzing the line.
If you are in a node that has only a few data (and where, for example, you need to read alternating lines so that you cannot read continuously), read it line by line and process the lines.
When you get into a section with a lot of data (for example, with "real data"), use the numpys fromfile method to read in the data, for example:
mydata = np.fromfile(fp, sep=" ", dtype=int, count=number_of_elements) mydata.shape = (100000, 3)
Thus, you combine the flexibility of linear processing with the ability to quickly read and convert large pieces of data.
UPDATE: The fact is that you open the file, read it in turn, and when you come to a place with a lot of data, you transfer the file descriptor to the file.
Below is a simplified example:
import numpy as np fp = open("test.dat", "r") line = fp.readline() ndata = int(line.strip()) data = np.fromfile(fp, count=ndata, sep=" ", dtype=int) fp.close()
This will read the data from the test.dat file with such contents as:
10 1 2 3 4 5 6 7 8 9 10
The first line is read explicitly with fp.read() , processed (the number of integers to read is determined), and then np.fromfile() reads the corresponding data fragment and stores it in the 1D data array.
UPDATE2: Alternatively, you can read all the text into a buffer, and then determine the start and end positions for a large piece of data and directly convert it via np.fromstring :
fp = open("test.dat", "r") txt = fp.read() fp.close()
Or, if it is easy to formulate it as one regular expression, you can use fromregex() directly in the file.