The search algorithm for the "most common elements" in different arrays

I have, for example, 5 arrays with some elements (numbers) inserted:

1, 4 , 8,10
1,2,3, 4 , 11,15
2, 4 , 20,21
2 , 30

I need to find the most common elements in these arrays, and each element must go all the way to the end (see example below). In this example, it would be a bold combination (or the same one, but with β€œ30” at the end, it is β€œthe same”) because it contains the least amount of different elements (only two, 4 and 2/30).

This combination (see below) is not good, because if I have for ex. "4" it should "go" to the end (the next array should not contain "4" at all). Thus, the combination must go all the way to the end.

1, 4 , 8,10
1, 2 , 3,4,11,15
2 , 4,20,21
2 , 30

EDIT2: OR

1, 4 , 8,10
1,2,3, 4 , 11,15
2 , 4,20,21
2 , 30

OR something else SHOULD NOT.

Is there any algorithm to speed up this thing (if I have thousands of arrays with hundreds of elements in each)?

: , ( ) - - . 4,4,4,2 , 4,2,2,2, 4 , 2.

: . , . , ,

1,2,3
1,4,5
4,5,6

1,1,4 1,1,5 1,1,6 2,5,5, 1 ( ), 2 ( ).

.

EDIT3: : (

EDIT4: @spintheblack 1,1,1,2,4 - , , (, 1), ( 1). , ""? , ( ), - , , .

+5
5

, " " , . , [1], [2], [1], [1, 2, 1]. , 3 .

, Python:

def find_best_run (first_array, *argv):
    # initialize data structures.
    this_array_best_run = {}
    for x in first_array:
        this_array_best_run[x] = (1, (1,), (x,))

    for this_array in argv:
        # find the best runs ending at each value in this_array
        last_array_best_run = this_array_best_run
        this_array_best_run = {}

        for x in this_array:
            for (y, pattern) in last_array_best_run.iteritems():
                (distinct_count, lengths, elements) = pattern
                if x == y:
                    lengths = tuple(lengths[:-1] + (lengths[-1] + 1,))
                else :
                    distinct_count += 1
                    lengths = tuple(lengths + (1,))
                    elements = tuple(elements + (x,))

                if x not in this_array_best_run:
                    this_array_best_run[x] = (distinct_count, lengths, elements)
                else:
                    (prev_count, prev_lengths, prev_elements) = this_array_best_run[x]
                    if distinct_count < prev_count or prev_lengths < lengths:
                        this_array_best_run[x] = (distinct_count, lengths, elements)

    # find the best overall run
    best_count = len(argv) + 10 # Needs to be bigger than any possible answer.
    for (distinct_count, lengths, elements) in this_array_best_run.itervalues():
        if distinct_count < best_count:
            best_count = distinct_count
            best_lengths = lengths
            best_elements = elements
        elif distinct_count == best_count and best_lengths < lengths:
            best_count = distinct_count
            best_lengths = lengths
            best_elements = elements

    # convert it into a more normal representation.                
    answer = []
    for (length, element) in zip(best_lengths, elements):
        answer.extend([element] * length)

    return answer

# example
print find_best_run(
    [1,4,8,10],
    [1,2,3,4,11,15],
    [2,4,20,21],
    [2,30]) # prints [4, 4, 4, 30]

. ...this_run , , , (distinct_count, lengths, elements). unique_count, ( - , ) . , , , . , , .

N M, O(N*M*M).

+1

, , arrays - , .

  • i = 0
  • current = arrays[i]
  • Loop i i+1 len(arrays)-1
  • new = current & arrays[i] ( , )
  • new - , 6, 7
  • current = new, 3 ( )
  • , current = arrays[i], 3 ( )

Python:

def mce(arrays):
  count = 1
  current = set(arrays[0])
  for i in range(1, len(arrays)):
    new = current & set(arrays[i])
    if new:
      count += 1
      current = new
    else:
      print " ".join([str(current.pop())] * count),
      count = 1
      current = set(arrays[i])
  print " ".join([str(current.pop())] * count)

>>> mce([[1, 4, 8, 10], [1, 2, 3, 4, 11, 15], [2, 4, 20, 21], [2, 30]])
4 4 4 2
+3

, ,,

  • .
  • "" , . 1 .
  • 2
+2

.

, /.

. :

1,4,8,10           <-- stop A
1,2,3,4,11,15      <-- stop B
2,4,20,21          <-- stop C
2,30               <-- stop D, destination

, , , , , 10 A, 10 B.

, , :

             A     B     C     D
line 1  -----X-----X-----------------
line 2  -----------X-----X-----X-----
line 3  -----------X-----------------
line 4  -----X-----X-----X-----------
line 8  -----X-----------------------
line 10 -----X-----------------------
line 11 -----------X-----------------
line 15 -----------X-----------------
line 20 -----------------X-----------
line 21 -----------------X-----------
line 30 -----------------------X-----

, , :

             A     B     C     D
line 1  -----X=====X-----------------
line 2  -----------X=====X=====X-----
line 3  -----------X-----------------
line 4  -----X=====X=====X-----------
line 8  -----X-----------------------
line 10 -----X-----------------------
line 11 -----------X-----------------
line 15 -----------X-----------------
line 20 -----------------X-----------
line 21 -----------------X-----------
line 30 -----------------------X-----

, A D .

, , :

  • 4 A, C, 2 C D

:

stops = [
    [1, 4, 8, 10],
    [1,2,3,4,11,15],
    [2,4,20,21],
    [2,30],
]

def calculate_possible_exit_lines(stops):
    """
    only return lines that are available at both exit
    and arrival stops, discard the rest.
    """

    result = []
    for index in range(0, len(stops) - 1):
        lines = []
        for value in stops[index]:
            if value in stops[index + 1]:
                lines.append(value)
        result.append(lines)
    return result

def all_combinations(lines):
    """
    produce all combinations which travel from one end
    of the journey to the other, across available lines.
    """

    if not lines:
        yield []
    else:
        for line in lines[0]:
            for rest_combination in all_combinations(lines[1:]):
                yield [line] + rest_combination

def reduce(combination):
    """
    reduce a combination by returning the number of
    times each value appear consecutively, ie.
    [1,1,4,4,3] would return [2,2,1] since
    the 1 appear twice, the 4 appear twice, and
    the 3 only appear once.
    """

    result = []
    while combination:
        count = 1
        value = combination[0]
        combination = combination[1:]
        while combination and combination[0] == value:
            combination = combination[1:]
            count += 1
        result.append(count)
    return tuple(result)

def calculate_best_choice(lines):
    """
    find the best choice by reducing each available
    combination down to the number of stops you can
    sit on a single line before having to switch,
    and then picking the one that has the most stops
    first, and then so on.
    """

    available = []
    for combination in all_combinations(lines):
        count_stops = reduce(combination)
        available.append((count_stops, combination))
    available = [k for k in reversed(sorted(available))]
    return available[0][1]

possible_lines = calculate_possible_exit_lines(stops)
print("possible lines: %s" % (str(possible_lines), ))
best_choice = calculate_best_choice(possible_lines)
print("best choice: %s" % (str(best_choice), ))

:

possible lines: [[1, 4], [2, 4], [2]]
best choice: [4, 4, 2]

, , , , , .

, :

  • 4 A , B, C
  • 2 C , D

, , , .

. , , / , .. . , .

+2

, , , , .

N , " " , . : 1) . 2) ( ). , 4 t 1 p 3 x 2 y

, - [[1,4], [1,2], [1,2], [2], [3,4]] - [1,1,1,2,4] (3 ) [4,2,2,2,4] ( )

, .

: ; , - ,

EDIT 2 . For everyone who is interested, the problem that I misunderstood can be formulated as an instance of the hit set problem, see http://en.wikipedia.org/wiki/Vertex_cover#Hitting_set_and_set_cover . Basically, the left side of the bipartite graph will be arrays, and the right side will be a number, the edges will be drawn between arrays that contain each number. Unfortunately, this is NP complete, but the greedy solutions described above are essentially the best approximation.

+1
source

All Articles