Crop rows and columns of the matrix, but keep them square

I have a square matrix s> 1000 rows and columns. Many fields on the border have nan , for example:

 grid = [[nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan], [nan, nan, 1, nan, nan], [nan, 2, 3, 2, nan], [ 1, 2, 2, 1, nan]] 

Now I want to delete all rows and columns where I only have nan . This will be row 1. and 2. and the last column. But I also want to get a square matrix, so the number of rows excluded should be equal to the number of columns deleted. In this example, I want to get the following:

 grid = [[nan, nan, nan, nan], [nan, nan, 1, nan], [nan, 2, 3, 2], [ 1, 2, 2, 1]] 

I'm sure I can solve this with a loop: check each column and row if there is only nan inside, and in the end I use numpy.delete to delete the rows and columns found (but only the minimum number, due to getting the square). But I hope someone can help me with a better solution or a good library.

+6
source share
3 answers

This works, the zping \ cols row indexes are key, so they always have the same length, hence keeping the matrix rectangular.

 nans_in_grid = np.isnan(grid) nan_rows = np.all(nans_in_grid, axis=0) nan_cols = np.all(nans_in_grid, axis=1) indicies_to_remove = zip(np.nonzero(nan_rows)[0], np.nonzero(nan_cols)[0]) y_indice_to_remove, x_indice_to_remove = zip(*indicies_to_remove) tmp = grid[[x for x in range(grid.shape[0]) if x not in x_indice_to_remove], :] grid = tmp[:, [y for y in range(grid.shape[1]) if y not in y_indice_to_remove]] 

Continuing to work with Mr., the solution, and then complementing the results, also works.

 def pad_to_square(a, pad_value=np.nan): m = a.reshape((a.shape[0], -1)) padded = pad_value * np.ones(2 * [max(m.shape)], dtype=m.dtype) padded[0:m.shape[0], 0:m.shape[1]] = m return padded g = np.isnan(grid) grid = pad_to_square(grid[:, ~np.all(g, axis=0)][~np.all(g, axis=1)]) 

Another solution based on a different answer. Significantly faster for large matrices.

 shape = grid.shape[0] first_col = (i for i,col in enumerate(grid.T) if np.isfinite(col).any() == True).next() last_col = (shape-i-1 for i,col in enumerate(grid.T[::-1]) if np.isfinite(col).any() == True).next() first_row = (i for i,row in enumerate(grid) if np.isfinite(row).any() == True).next() last_row = (shape-i-1 for i,row in enumerate(grid[::-1]) if np.isfinite(row).any() == True).next() row_len = last_row - first_row col_len = last_col - first_col delta_len = row_len - col_len if delta_len == 0: pass elif delta_len < 0: first_row = first_row - abs(delta_len) if first_row < 0: delta_len = first_row first_row = 0 last_row += abs(delta_len) elif delta_len > 0: first_col -= abs(delta_len) if first_col < 0: delta_len = first_col first_col = 0 last_col += abs(delta_len) grid = grid[first_row:last_row+1, first_col:last_col+1] 
+2
source
 import numpy as np nan = np.nan grid = [[nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan], [nan, nan, 1, nan, nan], [nan, 2, 3, 2, nan], [ 1, 2, 2, 1, nan]] g = np.array(grid) cols = np.isnan(g).all(axis=0) rows = np.isnan(g).all(axis=1) first_col = np.where(cols==False)[0][0] last_col = len(cols) - np.where(cols[::-1]==False)[0][0] -1 first_row = np.where(rows==False)[0][0] last_row = len(rows) - np.where(rows[::-1]==False)[0][0] -1 row_len = last_row - first_row col_len = last_col - first_col delta_len = row_len - col_len if delta_len == 0: pass elif delta_len < 0: first_row = first_row - abs(delta_len) if first_row < 0: delta_len = first_row first_row = 0 last_row += abs(delta_len) elif delta_len > 0: first_col -= abs(delta_len) if first_col < 0: delta_len = first_col first_col = 0 last_col += abs(delta_len) print g[first_row:last_row+1, first_col:last_col+1] 

Output:

 [[ nan nan nan nan] [ nan nan 1. nan] [ nan 2. 3. 2.] [ 1. 2. 2. 1.]] 
+2
source

Here is a shorter way. It works by looking at the primary diagonal, removing the column column + that is all nan, and then doing the same for the secondary diagonal:

 import numpy as np nan = np.nan grid = [[nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan], [nan, nan, 1, nan, nan], [nan, 2, 3, 2, nan], [ 1, 2, 2, 1, nan]] g = np.array(grid) for i in [1, 2]: cols = np.isnan(g).all(axis=0) rows = np.isnan(g).all(axis=1) main_diagonal = np.logical_not(cols & rows) ind = np.nonzero(main_diagonal)[0] main_diagonal[ind[0]:ind[-1]+1] = True # do not remove inner row/col removed_main_diag = g[main_diagonal][:, main_diagonal] g = removed_main_diag[:][::-1] print g 

Output:

 [[ nan nan nan nan] [ nan nan 1. nan] [ nan 2. 3. 2.] [ 1. 2. 2. 1.]] 
+2
source

All Articles