Need a pairing algorithm - based on Hungarian?

The Hungarian or Kuna-Munkra algorithm (a good description here ) is a pair of objects from two sets (of n and m objects, respectively, n> = m), so that the total โ€œdifferenceโ€ (or โ€œcostโ€ of assignment) between the paired objects is minimal. However, one feature of the algorithm does not suit me: it does only exhaustive pairing in the sense that it will connect all m objects with some of n objects. Instead, I would like to be able to create an arbitrary number of k pairs (k <= m) with a total minimum cost. For example, there is a 50x30 entry cost matrix; Kuhn-Munkres will optimally create, but all 30 pairs. Although I only need 20 pairs to be created this way optimally.

Could there be any modification of the Hungarian algorithm that allows this, or maybe a completely different algo? I appreciate your answers.

+4
source share
2 answers

Here are a few ideas to think about:

1) Suppose you write your cost matrix with n columns and m rows. If n is greater than m, you add paddings of constant high value additions to make them square. The minimal costing of rows and columns will now discard some columns by matching them with fill rows. Suppose you are now adding a very low cost add-on column for regular rows and a constant high cost for fill columns. Now the solution will match one of the correct rows in this column to take advantage of the very low cost. This reduces the number of lines that correspond to something reasonable. I think that if you add mk of such columns, you will get a minimum cost match that really only assigns k rows.

Here is an example of pairing 3 with 3 in 5x5, assuming ? marks problem-specific values > 0 but < 100 (you may need more extreme values than 0 and 100 to force the sort of solution you want depending on what your data values are). ? ? ? ? ? 0 0 ? ? ? ? ? 0 0 ? ? ? ? ? 0 0 ? ? ? ? ? 0 0 ? ? ? ? ? 0 0 100 100 100 100 100 100 100 100 100 100 100 100 100 100 

I expect that the optimal solution will use two 0s on the right and two 100s on the bottom rows. Are the remaining cells 3 x 3 matches squared? s

OK - here is the proof that adding columns and then rows as indicated above gives the desired type of match:

Suppose you take a cost matrix with values โ€‹โ€‹0 <x <100 and add a border of columns and rows of 0s and 100s as described above, and then solve it as an assignment problem. Draw two lines at the border of 0s and 100s, expanding them to cut the square into four areas, where the area in the upper left corner is the original matrix. If the assignment algorithm did not select any of the cells in the lower right pane, then it selected the s cells in the upper right pane (to select the rightmost s columns), therefore s rows in the orgininal cost matrix in the upper left pane paired with cells in the zero column . Other rows in the upper region should be connected to a nonzero column, so you have a match in the original region, in which there are s rows, and therefore the s-columns are unpaired (that is, paired with a zero cell).

Is it possible that the assimilation solution has any cells in the right sxs area selected below? Consider any such assignment. To prove that at least one cell in the upper left corner must be selected, suppose that none are selected. Then we must somehow select a cell from each of the top n rows, presumably by selecting cells from the upper right pane. Each such cell should be in a separate column, but in the upper right case there are only s columns, which is not enough, because we need only one column for each match that we want to skip, and we used one column in this area to already fill the cell in lower right area. Therefore, suppose that the solution selects at least one cell in the original upper left region and at least one cell in the lower right region. Select the other two cells that do this at the four corners of the square. These cells cannot be selected. If we select these cells instead of the two currently selected, we get a different solution. Two new cells are cell 0 from the top right and 100 cells from the bottom left. They would replace the 100th cell from the bottom right and the cell with a value greater than zero in the main matrix. Thus, this will improve our proposed solution, so any solution containing a cell in the lower right area is not the best solution, and the assignment algorithm will not return it to us.

Thus, this trick of adding columns from 0s, and then rows with large values โ€‹โ€‹will lead to the creation of a solution to the assignment algorithm, which allows you to exclude one match from the original solution for each (row, column).

2) The assignment task is a special case of http://en.wikipedia.org/wiki/Minimum-cost_flow_problem . I think you need a minimal cost stream that transfers k units from rows to columns, so you can try to solve them like this.

3) The problem of the minimum flow of costs is a special case of linear programming. I think you could write a linear program for assigning numbers in the range [0,1] of the matrix cells so that each row and each column are added up to no more than 1, and the total sum of all cells is k. The objective function is the amount in each cell when it is standing.

+1
source

Your approach may be wrong, but the Hungarian algorithm is only for a bipartite chart. For a general (non-bipartite) graph (i.e. Weighted Match), see here http://en.wikipedia.org/wiki/Edmonds%27s_matching_algorithm . Or do you want to cheat, and you give only the top ten pairs with maximum value?

0
source

All Articles