Make 2D Numpy Array From Coordinates

Question

Make 2D Numpy Array From Coordinates

I have data points that are the coordinates for a 2D matrix. Points are regularly snapped to the grid, except that data points are missing at some grid positions.

For example, consider some XYZ data that are suitable for a regular 0.1 mesh with the shape (3, 4). There are gaps and missing points, so there are 5 points, not 12:

import numpy as np X = np.array([0.4, 0.5, 0.4, 0.4, 0.7]) Y = np.array([1.0, 1.0, 1.1, 1.2, 1.2]) Z = np.array([3.3, 2.5, 3.6, 3.8, 1.8]) # Evaluate the regular grid dimension values Xr = np.linspace(X.min(), X.max(), np.round((X.max() - X.min()) / np.diff(np.unique(X)).min()) + 1) Yr = np.linspace(Y.min(), Y.max(), np.round((Y.max() - Y.min()) / np.diff(np.unique(Y)).min()) + 1) print('Xr={0}; Yr={1}'.format(Xr, Yr)) # Xr=[ 0.4 0.5 0.6 0.7]; Yr=[ 1. 1.1 1.2]

What I would like to see is shown in this image (backgrounds: black = base index 0, gray = coordinate value, color = matrix value, white = missing).

Here is what I have intuitive with a for loop:

 ar = np.ma.array(np.zeros((len(Yr), len(Xr)), dtype=Z.dtype), mask=True) for x, y, z in zip(X, Y, Z): j = (np.abs(Xr - x)).argmin() i = (np.abs(Yr - y)).argmin() ar[i, j] = z print(ar) # [[3.3 2.5 -- --] # [3.6 -- -- --] # [3.8 -- -- 1.8]]

Is there a more NumPythonic way to vectorize the approach for returning a 2D ar array? Or do you need a for loop?

+5

python arrays vectorization numpy

Mike t Aug 3 '15 at 20:19

source share

4 answers

dermen · Answer 1 · 2015-08-03T20:58:19+0000

You can do this on one line with np.histogram2d

 data = np.histogram2d(Y, X, bins=[len(Yr),len(Xr)], weights=Z) print(data[0]) [[ 3.3 2.5 0. 0. ] [ 3.6 0. 0. 0. ] [ 3.8 0. 0. 1.8]]

Divakar · Answer 2 · 2015-08-04T06:30:55+0000

You can use X and Y to create XY coordinates on a 0.1 spaced grid extending from min to max of X and min to max of Y , and then inserting Z's at these specific positions. This would avoid using linspace to get Xr and Yr and, as such, should be quite efficient. Here's the implementation -

 def indexing_based(X,Y,Z): # Convert X and Y to indices on a 0.1 spaced grid X_int = np.round((X*10)).astype(int) Y_int = np.round((Y*10)).astype(int) X_idx = X_int - X_int.min() Y_idx = Y_int - Y_int.min() # Setup output array and index it with X_idx & Y_idx to set those as Z out = np.zeros((Y_idx.max()+1,X_idx.max()+1)) out[Y_idx,X_idx] = Z return out

Runtime Tests -

This section compares the indexing-based approach to another np.histogram2d solution -

 In [132]: # Create unique couples XY (as needed to work with histogram2d) ...: data = np.random.randint(0,1000,(5000,2)) ...: data1 = data[np.lexsort(data.T),:] ...: mask = ~np.all(np.diff(data1,axis=0)==0,axis=1) ...: data2 = data1[np.append([True],mask)] ...: ...: X = (data2[:,0]).astype(float)/10 ...: Y = (data2[:,1]).astype(float)/10 ...: Z = np.random.randint(0,1000,(X.size)) ...: In [133]: def histogram_based(X,Y,Z): # From other np.histogram2d based solution ...: Xr = np.linspace(X.min(), X.max(), np.round((X.max() - X.min()) / np.diff(np.unique(X)).min()) + 1) ...: Yr = np.linspace(Y.min(), Y.max(), np.round((Y.max() - Y.min()) / np.diff(np.unique(Y)).min()) + 1) ...: data = np.histogram2d(Y, X, bins=[len(Yr),len(Xr)], weights=Z) ...: return data[0] ...: In [134]: %timeit histogram_based(X,Y,Z) 10 loops, best of 3: 22.8 ms per loop In [135]: %timeit indexing_based(X,Y,Z) 100 loops, best of 3: 2.11 ms per loop

user2539336 · Answer 3 · 2015-08-03T20:59:56+0000

You can use scipy coo_matrix. It allows you to build a sparse matrix of coordinates and data. See Examples in the attached link.

http://docs.scipy.org/doc/scipy-0.15.1/reference/generated/scipy.sparse.coo_matrix.html

Hope this helps.

hpaulj · Answer 4 · 2015-08-03T23:38:55+0000

The sparse matrix is the first solution that came to mind, but since X and Y are floating, it's a little messy:

 In [624]: I=((X-.4)*10).round().astype(int) In [625]: J=((Y-1)*10).round().astype(int) In [626]: I,J Out[626]: (array([0, 1, 0, 0, 3]), array([0, 0, 1, 2, 2])) In [627]: sparse.coo_matrix((Z,(J,I))).A Out[627]: array([[ 3.3, 2.5, 0. , 0. ], [ 3.6, 0. , 0. , 0. ], [ 3.8, 0. , 0. , 1.8]])

It is still necessary, one way or another, to compare these coordinates with the indices [0,1,2 ...]. My quick trick was simply to scale the values linearly. However, I had to take care to convert float to ints.

sparse.coo_matrix works because the natural way to define a sparse matrix is (i, j, data) tuples, which of course can be translated into lists I , J , Data or arrays.

I rather like the solution for the story, although I did not have the opportunity to use it.

Make 2D Numpy Array From Coordinates

More articles: