This would be much faster if you made sure that each block has the same number of lines (even if they are filled with zeros). That way you can just change the array read with loadtxt . But given this limitation, here is an example that might be a little faster:
import numpy as np data = np.loadtxt("myfile", usecols=(0, 1), unpack=True) nx = np.sum(data[0] == 0) ny = np.max(data[0]) my_mat = np.empty((nx, ny), dtype='d') my_mat[:] = np.nan # if you really want to populate it with NaNs for missing tr_ind = data[0, list(np.nonzero(np.diff(data[0]) < 0)[0]) + [-1]].astype('i') buf = np.squeeze(data[1, np.nonzero(data[0])]) idx = 0 for i in range(nx): my_mat[i, :tr_ind[i]] = buf[idx : idx + tr_ind[i]] idx += tr_ind[i]
And you can check the result:
>>> print my_mat.T array([[ 0.5, 0.2, 0.7], [ 0.9, 0.2, 0.6], [ 0.4, 0.4, 0.9], [ 0.1, 0.9, 0.2], [ nan, nan, 0.7]])
UPDATE: as TheodrosZelleke pointed out, the above solution fails if x2 (first column) is nonzero. I did not notice this for the first time. Here is an update to get around this:
# this will give a conversion warning because column number varies blk_sizes = np.genfromtxt("myfile", invalid_raise=False, usecols=(-2,)) nx = blk_sizes.size ny = np.max(blk_sizes) data = np.loadtxt("myfile", usecols=(1,)) my_mat = np.empty((nx, ny), dtype='d') my_mat[:] = np.nan idx = 1 for i in range(nx): my_mat[i, :blk_sizes[i]] = data[idx : idx + blk_sizes[i]] idx += blk_sizes[i] + 1
(And then take my_mat.T .)