How to split a diagonal matrix into an equal number of elements on one axis?

I have a very large diagonal matrix, which I need to split for parallel computing. Due to the problems of data localization, it makes no sense to sort out the matrix and divide each nth calculation between n streams. Currently, I divide the kxk diagonal matrix as follows, but it gives unequal sections according to the number of calculations (the smallest part is calculated several times longer than the largest).

def split_matrix(k, n): split_points = [round(i * k / n) for i in range(n + 1)] split_ranges = [(split_points[i], split_points[i + 1],) for i in range(len(split_points) - 1)] return split_ranges import numpy as np k = 100 arr = np.zeros((k,k,)) idx = 0 for i in range(k): for j in range(i + 1, k): arr[i, j] = idx idx += 1 def parallel_calc(array, k, si, endi): for i in range(si, endi): for j in range(k): # do some expensive calculations for start_i, stop_i in split_matrix(k, cpu_cnt): parallel_calc(arr, k, start_i, stop_i) 

Do you have any suggestions regarding the implementation or function of the library?

+1
source share
2 answers

After a series of geometric calculations on the side, I came to the following partition, which gives approximately the same number of matrix points in each of the vertical (or horizontal, if you want) sections.

 def offsets_for_equal_no_elems_diag_matrix(matrix_dims, num_of_partitions): if 2 == len(matrix_dims) and matrix_dims[0] == matrix_dims[1]: # square k = matrix_dims[0] # equilateral right angle triangles have area of side**2/2 and from this area == 1/num_of_partitions * 1/2 * matrix_dim[0]**2 comes the below # the k - ... comes from the change in the axis (for the calc it is easier to start from the smallest triangle piece) div_points = [0, ] + [round(k * math.sqrt((i + 1)/num_of_partitions)) for i in range(num_of_partitions)] pairs = [(k - div_points[i + 1], k - div_points[i], ) for i in range(num_of_partitions - 1, -1, -1)] return pairs 
+1
source

I want you to update your split_matrix method, since it returns one dividing range less than you want (setting cpu_cnt=4 will only return 3 tuples, not 4 ):

 def split_matrix(k, n): split_points = [round(i * k / n) for i in range(n+1)] return [(split_points[i], split_points[i + 1],) for i in range(len(split_points) - 1)] 

Edit: if your data localization is not such a line, you can try this: create a queue task in which you add all the indexes / records for which this calculation should be performed. Then you initialize your parallel workers (for example, using multiprocessing ) and start them. This worker now selects an element from the queue , calculates the result, saves it (for example, in another queue ) and continues the next element, etc.

If this does not work for your data, I do not think you can improve it.

0
source

All Articles