Sort resource access schedules for multiple threads to minimize write conflicts

Scenario:

Given the resource set R: set of resources

Given a set of T threads that will work in parallel: set of threads

Each thread must access a list of n resources. Each list is a sample of R, which means that each resource is unique in each list: threads access random samples of resources

But since access lists are selectively sorted, there may be conflicts: conflicting access

Random resource lists will be initialized once at the beginning. After that, each thread will perform an atomicAdd operation for each resource in the list, subsequently. The access order of resources in each list does not matter.

Question:

Is there an algorithm that sorts planning lists so that the number of write conflicts is minimized? Thus, the end result will look like this: resolved conflicts

My understanding so far:

  • Random sampling is important for the context of the algorithm, so there is no way to initialize lists in another way (only their order can be changed).
  • The general schedule can be considered as a matrix S with | T | rows and n columns, where each entry is an element of R.
  • If | T | <= | R |, a solution is possible without any conflicts.
  • If | T | == | R |, the columns of the optimized planning matrix S are permutations of R.
  • If | T | > | R |, the average number of conversions to conversions in the optimized planning matrix should be | T | / | R |

Possible approaches:

I am looking for an analytical solution to this problem. Maybe np-complete? If so, I am thinking of developing a genetic algorithm to solve this problem.

Edit 1 : added charts.

+8
sorting algorithm parallel-processing
source share
4 answers

I think the question asks "can we sort the lists to reduce conflicts."

I think that the optimal NP solution is complete, but I would look for the number of events in the set for each resource.

Step 1

The most commonly used resource is the most difficult to plan. Thus, I would put this resource in each of the flows at positions 1, 2, 3, 4, ... Before a conflict occurs (which may be inevitable), for example. 1,2,3,4, ..., n, 1, 2, ....

These are the "big stones." They must be placed first.

Step 2

Then use the next most commonly used resource. This should determine which slots (1 => n) were used recently and search this list for a slice that is unallocated and not used recently.

Whatever slot is selected, it moves to the top of the recently used queue, which should be avoided for a while.

This contributes to the distribution of the resource, but allows duplication. These duplicates will be used recently and will no longer be used for planning until valid options are available.

Finally

Step 2 is repeated for each of the resources in the order they appear.

+4
source share

Formalization problem

At first it looks like an OSSP option. We need to schedule R resources on top of T processors. Some planning time 0 , some 1 .

However, we need to complete the entire sequence in n timesteps, and there is exactly n*T non-zero scheduling time.

So, we are looking for a schedule at n time without TT conflicts (since no thread can work on two resources at the same time) and with a minimum number of RR conflicts. I assume the objective function to minimize:

enter image description here

Where count is the number of threads using resource j at time i .

Building a task graph

We construct a graph G=(V,E) with a vertex for each flow (first part) and each resource (second part). For each non-zero planning time, we have an edge from flow to resource. This graph is obviously bipartite.

Each vertex of the stream has degree n .

Our goal is to color this graph with colors n so that:

  • No vertex of the thread has two adjacent edges with the same color.

  • The number of adjacent edges with the same color is minimum

No Conflict Solution

If there is no resource vertex with degree d > n , then the graph has the correct coloring of the edges with colors no more than n . And the correct coloring, of course, is the best coloring from the point of view of the objective function - there are no conflicts at all.

Bilateral coloring of the edges of the graph can be done in O(n * T * R) time .

Conflict Resolution

Now suppose we have a resource vertex with degree d > n . This means that there is no correct coloring of the edges with colors n , and we will have a conflict in our schedule.

It is related to the number of conflicts.

We have several V_conflict vertices with degree d > n . Then the number of conflicts is q : sum max (0, dn)

This cannot be less, since each conflicting color in red coloring is a conflict in our schedule, and for each vertex with degree d > n we have at least d - n conflicting colors.

Now we want to build a solution with exact conflicts q . Remove any set of edges from each vertex in V_conflict to reduce their degree to n . We removed exactly q edges. Now we have a conflict-free solution (as the correct coloring of the edges in the colors n ).

Now insert the previously deleted edges q , assigning a color that has not yet been assigned to any edge of their corresponding vertex of the stream. Since each added edge can introduce only one conflict, now we have exactly q , which is proved by the lower boundary.

All this step with conflicts can be done in:

  • O(R) to define V_conflict

  • O(R*T) to remove conflicting edges

  • O(n * T * R) to solve the shortened version without conflicts.

  • O(n * q) to add edges back to the graph

Thus, a complete solution can be reached in O(n * T * R) time.

+4
source share

An approach

  • A combined list called CountedResourceList is created in the resource list of each thread, which stores <resource-id, total occurrence count of that id> in descending order of quantity. It takes O (rt * log (rt)) time. Why? this is explained in the example run below. Also in the same iteration, we create a HashMap so that we can find in O (1) the time at which this resource identifier is indicated in the list of threads (based on the idea of โ€‹โ€‹an Inverted pointer for those that may be relevant).
  • Then we create a matrix - SortedListMatrix - from t rows and r columns. In this matrix, we start by placing the resource identifiers from the CountedResourceList in this stream line, which originally contained this maximum occurring resource identifier. Thus, resource identifiers come out in the order most often found in the least encountered.
  • To help with the insertion of resource identifiers, we also create an int[t] FreeColumnCount that counts the number of free columns for each stream. Since a stream with fewer free slots has less ability to move the resource identifier in case of conflict, the resource identifier will first be filled in the line with the smallest free slots for streams containing this resource identifier.
  • Starting from the 0th column of the first stream that contains the highest resource identifier, we select the location for the next id using the formulas row = GetRowForResource(id) in the HashMap, and column = (column+1)%r . If the final column in the row is busy, we take the next unallocated column value by iterating the column value using the same formula without changing the row. Therefore, the columns are subject to wrapping, and the rows are determined from the HashMap, so the resource identifier is included in the correct list and in the least conflicting place .
  • Once this matrix is โ€‹โ€‹full, start at row 0 and assign rows to each thread as your last least controversial list of resources.

Attn: I periodically mentioned the complexity of several steps, but will update soon with overall complexity at runtime. At the moment, I'm trying to find the right algorithm.


Run example

Initial Assumptions and Definitions:

  • The number of threads = t , starting from thread 0 to t-1
  • The number of resources for each thread = r
  • Total resources for preparing a list of resources for flows will be > = r

We start with the case t = 4. T0 = {4, 3, 5}, T1 = {1, 2, 6}, T2 = {3, 1, 2}, T4 = {2, 7, 1}. Now I know that there are no more conflicts in this list. There should be a preprocessing step in which it turns out whether there should be some kind of permutation in the first place. For example, if the lists already have minimal conflicts, perhaps, or if there are no conflicts, the lists should be returned as they are. However, look at the pseudo-code in action.

First we read all the resources in all the lists and put them in separate buckets of resource identifiers. Using Counting Sort , this will be a linear step considering O (tr + constant) . However, since this simply gives the number of resource identifiers, we will need to use Merge Sort to sort the identifiers in descending order. This step will take O ((rt) log (rt)) . The result will be a list or identifiers sorted in descending order -

CountedResourceList = <3 times, id 1>, <3 times, id 2>, <2 times, id 3>, <1 time each identifier 4, 5, 6, 7>

During the same iteration, we also create an array of HashMap and FreeColumnCount .

Then we create a SortedListMatrix , which is populated starting from line = first stream to contain the max identifier of the resource (by calling GetRowForResource (id)) , column = 0. The matrix will be empty first and then populated as follows:

 Initially: {_, _, _} {_, _, _} {_, _, _} {_, _, _} Taking 1st element '1' at (1, 0): {_, _, _} {1, _, _} {_, _, _} {_, _, _} Taking 2nd element '1' at (2, 1): {_, _, _} {1, _, _} {_, 1, _} {_, _, _} Taking 3rd element '1' at (3, 2): {_, _, _} {1, _, _} {_, 1, _} {_, _, 1} Taking 4th element '2' at (1, 1) as (1, 0) is already occupied: {_, _, _} {1, 2, _} {_, 1, _} {_, _, 1} Taking 5th element '2' at (2, 2): {_, _, _} {1, 2, _} {_, 1, 2} {_, _, 1} Taking 6th element '2' at (3, 0): {_, _, _} {1, 2, _} {_, 1, 2} {2, _, 1} Taking 7th element '3' at (0, 0) because row 2 has least free slots: {_, _, _} {1, 2, _} {3, 1, 2} {2, _, 1} Taking 8th element '3' at (0, 1): {_, 3, _} {1, 2, _} {3, 1, 2} {2, _, 1} Taking 9th element '4' at (0, 2): {_, 3, 4} {1, 2, _} {3, 1, 2} {2, _, 1} Taking 10th element '5' at (0, 0): {5, 3, 4} {1, 2, _} {3, 1, 2} {2, _, 1} Taking 11th element '6' at (1, 2): {5, 3, 4} {1, 2, 6} {3, 1, 2} {2, _, 1} Taking 12th element '7' at (3, 1): {5, 3, 4} {1, 2, 6} {3, 1, 2} {2, 7, 1} 

The complexity of the above step will be O (rt) .

When the matrix is โ€‹โ€‹completely filled, T0 will be assigned row 0 as its list of resources, T1 as row 1 ...

+3
source share

I do not know any algorithms. One approach with the understanding that it will normally be reordered by a sequence is the locks that represent each resource.

When accessing a resource, the stream gets the corresponding lock for this resource. If another thread wants to access the same resource, it redistributes access from the next. For example, T1 can access R1. If T2 also requires access to R1, then T2 can instead redistribute (swap) access for R1, say, for R2, and then take R1, assuming that T1 runs with it.

0
source share

All Articles