Organizing a List of Tuples

I have a list of tuples that I create dynamically.

The list is as follows:

List = [(1,4), (8,10), (19,25), (10,13), (14,16), (25,30)] 

Each set (a, b) list represents a range of indices from a specific table.

The ranges (a, b) and (b, d) in my situation are the same as (a, d)

I want to combine tuples where the second element matches the first of the others.

So, in the above example, I want to combine (8, 10), (10,13) to get (8,13) and remove (8, 10), (10,13)

(19,25) and (25,30) merge should give (19, 30)

I have no clue where to start. Tuples do not overlap.

Edit: I was trying to just avoid any looping cycle, since I have a pretty large list

+6
source share
7 answers

non-recursive approach using sorting (I added more nodes to handle the difficult case):

 l = [(1,4), (8,10), (19,25), (10,13), (14,16), (25,30), (30,34), (38,40)] l = sorted(l) r=[] idx=0 while idx<len(l): local=idx+1 previous_value = l[idx][1] # search longest string while local<len(l): if l[local][0]!=previous_value: break previous_value = l[local][1] local+=1 # store tuple r.append((l[idx][0],l[local-1][1])) idx = local print(r) 

result:

 [(1, 4), (8, 13), (14, 16), (19, 34), (38, 40)] 

The only drawback is that the original sort order is not preserved. I do not know if this is a problem.

+2
source

If you need to take things into account, such as the skovorodkin example in a comment,

 [(1, 4), (4, 8), (8, 10)] 

(or even more complex examples), then one of the ways to use them effectively is to use charts.

Let's say you create a digraph (possibly using networkx ), where each pair is a node, and there is an edge from (a, b) to node (c, d) if b == c. Now do a topological sort , iterate in order, and merge accordingly. You must take care to handle nodes with two (or more) outgoing edges properly.


I understand that your question says that you want to avoid loops due to the large size of the list. Conversely, for long lists, I doubt that you will even find an effective linear time solution using list comprehension (or something like that). Note that you cannot sort the list in linear time, for example.


Here is a possible implementation:

Let's say we start with

 l = [(1,4), (8,10), (19,25), (10,13), (14,16), (25,30)] 

To remove duplicates, the following is simplified:

 l = list(set(l)) 

Now, to build a digraph:

 import networkx as nx import collections g = nx.DiGraph() 

Peaks are just pairs:

 g.add_nodes_from(l) 

To build edges, we need a dictionary:

 froms = collections.defaultdict(list) for p in l: froms[p[0]].append(p) 

Now we can add the edges:

 for p in l: for from_p in froms[p[1]]: g.add_edge(p, from_p) 

The following two lines are not needed - they are just here to show what the graph looks like at this stage:

 >>> g.nodes() [(25, 30), (14, 16), (10, 13), (8, 10), (1, 4), (19, 25)] >>> g.edges() [((8, 10), (10, 13)), ((19, 25), (25, 30))] 

Now, sort the pairs by topological view:

 l = nx.topological_sort(g) 

Finally, the hard part here. The result is a DAG. We have to go recursively, but remember that we have already visited.

Let's create a request for what we visited:

 visited = {p: False for p in l} 

Now the recursive function that sets the node returns the maximum edge of the range from any node available from it:

 def visit(p): neighbs = g.neighbors(p) if visited[p] or not neighbs: visited[p] = True return p[1] mx = max([visit(neighb_p) for neighb_p in neighbs]) visited[p] = True return mx 

We are all ready. Let me create a list for the last pairs:

 final_l = [] 

and visit all nodes:

 for p in l: if visited[p]: continue final_l.append((p[0], visit(p))) 

Here's the end result:

 >>> final_l [(1, 4), (8, 13), (14, 16)] 
+6
source

If they do not overlap, you can sort them and then just connect the adjacent ones.

Here's a generator that gives new tuples:

 def combine_ranges(L): L = sorted(L) # Make a copy as we're going to remove items! while L: start, end = L.pop(0) # Get the first item while L and L[0][0] == end: # While the first of the rest connects to it, adjust # the end and remove the first of the rest _, end = L.pop(0) yield (start, end) print(list(combine_ranges(List))) 

If speed is important, use collections.deque instead of a list, so .pop(0) operations can be at a constant speed.

+5
source

Here is one optimized recursion method:

 In [44]: def find_intersection(m_list): for i, (v1, v2) in enumerate(m_list): for j, (k1, k2) in enumerate(m_list[i + 1:], i + 1): if v2 == k1: m_list[i] = (v1, m_list.pop(j)[1]) return find_intersection(m_list) return m_list 

Demo:

 In [45]: lst = [(1,4), (8,10), (19,25), (10,13), (14,16), (25,30)] In [46]: find_intersection(lst) Out[46]: [(1, 4), (8, 13), (19, 30), (14, 16)] 
+2
source

You can use the dictionary to match different end indexes with a range ending with that index; then just list the list sorted by start index, and merge the segments accordingly:

 def join_lists(lst): ending = {} # will map end position to range for start, end in sorted(lst): # iterate in sorted order if start in ending: ending[end] = (ending[start][0], end) # merge del ending[start] # remove old value else: ending[end] = (start, end) return list(ending.values()) # return remaining values from dict 

Alternatively, as noted in Tomer W in the comments , you can do without sorting by repeating the list twice, forcing this decision to accept only linear time (O (n)) wrt the length of the list.

 def join_lists(lst): ending = {} # will map end position to range # first pass: add to dictionary for start, end in lst: ending[end] = (start, end) # second pass: lookup and merge for start, end in lst: if start in ending: ending[end] = (ending[start][0], end) del ending[start] # return remaining values from dict return list(ending.values()) 

Output examples for both cases:

 >>> join_lists([(1,4), (8,10), (19,25), (10,13), (14,16), (25,30)]) [(1, 4), (8, 13), (14, 16), (19, 30)] >>> join_lists(lst = [(1, 4), (4, 8), (8, 10)]) [(1, 10)] 
+2
source

The list is sorted first, and adjacent pairs (min1, max1), (min2, max2) merge together if they overlap.

 MIN=0 MAX=1 def normalize(intervals): isort = sorted(intervals) for i in range(len(isort) - 1): if isort[i][MAX] >= isort[i + 1][MIN]: vmin = isort[i][MIN] vmax = max(isort[i][MAX], isort[i + 1][MAX]) isort[i] = None isort[i + 1] = (vmin, vmax) return [r for r in isort if r is not None] List1 = [(1,4), (8,10), (19,25), (10,13), (14,16), (25,30)] List2 = [(1, 4), (4, 8), (8, 10)] print(normalize(List1)) print(normalize(List2)) #[(1, 4), (8, 13), (14, 16), (19, 30)] #[(1, 10)] 
+1
source

The following should work. It breaks the tuples into separate numbers, then finds a tuple tied to each cluster. This should work even with complex ceilings, for example [(4, 10), (9, 12)]

This is a very simple fix.

 # First turn your list of tuples into a list of numbers: my_list = [] for item in List: my_list = my_list + [i for i in range(item[0], item[1]+1)] # Then create tuple pairs: output = [] a = False for x in range(max(my_list)+1): if (not a) and (x in my_list): a = x if (a) and (x+1 not in my_list): output.append((a, x)) a = False print output 
+1
source

All Articles