Divide the list of numbers into a smaller list with a "sum" of approximately the same

I run about 2000 tests on the grid, each test is performed as a separate task on the grid. Tests have quite a long run time. Overall execution takes 500 hours, ends in less than 10 hours at 60 node SunGridEngine. The duration of the tests varies from 5 minutes to 90 minutes. The combination of tests without much intelligence led to some performance improvements. I would like to create β€œtasks” that are roughly equal in size. How can i do this?

(what we are doing now: sort all the tests and add until the total runtime is about 5 hours. Look for something better)

+5
source share
5 answers

Doing this is optimally NP-complete. This is a variation of the partition problem, which is a special case of the problem of the sum of a subset , which in itself is a special case of a problem with a backpack .

In your case, you probably don't need the exact solution, so you can probably use some heuristics to get something β€œgood enough” in a reasonable amount of time. See the Methods section of the problems page of the section for a description of some approaches.

+11
source

What you are looking for is a partition problem for k sets.

There is some literature on k = 3 called the 3-section problem. This is NP complete in the strong sense.

, .

: http://en.wikipedia.org/wiki/Partition_problem

, .

+3

NP-. .

+3

. , . , . , , , .

+1

Looking at the links published by Lawrence, I thought I would try to whip something. The algorithm should assign the longest test to the shortest list of tasks (repeat until all tests are assigned). Using your examples and random test times, the std deviation was pretty low, less than 2 minutes when you ran it several times (C # code, but nothing that would not be trivial to convert):

    private static void BuildJobs()
    {
        PriorityQueue<Task> tasks = new PriorityQueue<Task>();

        //create a task list for each node
        for (int i = 0; i < 60; i++)
        {
            Task t = new Task();
            tasks.Enqueue(t);
        }

        //get the list of tests, in order from longest to shortest
        int[] testList = new int[2000];

        for (int i = 0; i < testList.Length; i++)
        {
            testList[i] = random.Next(5, 90);
        }

        Array.Sort<int>(testList);
        Array.Reverse(testList);

        // add the longest running test to the current shortest task list
        foreach (int time in testList)
        {
            Task t = tasks.Dequeue();
            t.addTest(time);
            tasks.Enqueue(t);
        }

        Debug.WriteLine(CalculateStdDev(tasks));

    }
0
source

All Articles