The proposed method has a python loop, so with large values ββof ni it will be slow. However, unless you have a large ni , you should not worry too much.
I created a sample input with the following code:
def sample_data(n_i, n_j, z_shape) : x = np.random.rand(n_i, n_j) * 1000 x.sort() x[:,0] = 0 x[:, -1] = 1000 y = np.random.rand(n_i, n_j) z = np.random.rand(*z_shape) * 1000 return x, y, z
And they were tested using two versions of linear interpolation:
def interp_1(x, y, z) : rows, cols = x.shape out = np.empty((rows,) + z.shape, dtype=y.dtype) for j in xrange(rows) : out[j] =interp1d(x[j], y[j], kind='linear', copy=False)(z) return out def interp_2(x, y, z) : rows, cols = x.shape row_idx = np.arange(rows).reshape((rows,) + (1,) * z.ndim) col_idx = np.argmax(x.reshape(x.shape + (1,) * z.ndim) > z, axis=1) - 1 ret = y[row_idx, col_idx + 1] - y[row_idx, col_idx] ret /= x[row_idx, col_idx + 1] - x[row_idx, col_idx] ret *= z - x[row_idx, col_idx] ret += y[row_idx, col_idx] return ret
interp_1 is an optimized version of your code, after Dave's answer. interp_2 is a vectorized implementation of linear interpolation that avoids any python loop. This type of encoding requires a clear understanding of broadcasting and indexing in numpy, and some things will be less optimized than what interp1d does. A striking example is the search for a hopper in which you can interpolate a value: interp1d will probably exit the loops sooner when it finds a bit, the above function compares the value with all cells.
Thus, the result will very much depend on what n_i and n_j , and even how long your array of z values ββto interpolate. If n_j small and n_i large, you should expect benefits from interp_2 and from interp_1 if it is the other way around. Smaller z should be an advantage for interp_2 , longer should be interp_1 .
I actually ran both approaches with different n_i and n_j , for z forms (5,) and (50,) , here are the graphs:


So, for z form (5,) you need to go with interp_2 whenever n_j < 1000 and interp_1 in another place. It is not surprising that the threshold for z differs from (50,) , now it is around n_j < 100 . It seems tempting to conclude that you should stick with your code if n_j * len(z) > 5000 , but change it to something like interp_2 above, if not, but there is a lot of extrapolation in this statement! If you want to continue experimenting, here is the code I used to create the graphs.
n_s = np.logspace(1, 3.3, 25) int_1 = np.empty((len(n_s),) * 2) int_2 = np.empty((len(n_s),) * 2) z_shape = (5,) for i, n_i in enumerate(n_s) : print int(n_i) for j, n_j in enumerate(n_s) : x, y, z = sample_data(int(n_i), int(n_j), z_shape) int_1[i, j] = min(timeit.repeat('interp_1(x, y, z)', 'from __main__ import interp_1, x, y, z', repeat=10, number=1)) int_2[i, j] = min(timeit.repeat('interp_2(x, y, z)', 'from __main__ import interp_2, x, y, z', repeat=10, number=1)) cs = plt.contour(n_s, n_s, np.transpose(int_1-int_2)) plt.clabel(cs, inline=1, fontsize=10) plt.xlabel('n_i') plt.ylabel('n_j') plt.title('timeit(interp_2) - timeit(interp_1), z.shape=' + str(z_shape)) plt.show()