Consolidate IPs in ranges in python

Suppose I have a list of IP ranges (only the last term) that may or may not overlap:

('1.1.1.1-7', '2.2.2.2-10', '3.3.3.3-3.3.3.3', '1.1.1.4-25', '2.2.2.4-6') 

I am looking for a way to identify any overlapping ranges and combine them into separate ranges.

 ('1.1.1.1-25', '2.2.2.2-10', '3.3.3.3-3') 

The current thought for the algorithm is to expand all ranges into a list of all IP addresses, eliminate duplicates, sort and consolidate any consecutive IP addresses.

Any other suggestions of the python-esque algorithm?

+7
source share
5 answers

Here is my version as a module. My algorithm is identical to the one lunixbochs mentioned in his answer, and converting from a range string to integers and back is well modular.

 import socket, struct def ip2long(ip): packed = socket.inet_aton(ip) return struct.unpack("!L", packed)[0] def long2ip(n): unpacked = struct.pack('!L', n) return socket.inet_ntoa(unpacked) def expandrange(rng): # expand '1.1.1.1-7' to ['1.1.1.1', '1.1.1.7'] start, end = [ip.split('.') for ip in rng.split('-')] return map('.'.join, (start, start[:len(start) - len(end)] + end)) def compressrange((start, end)): # compress ['1.1.1.1', '1.1.1.7'] to '1.1.1.1-7' start, end = start.split('.'), end.split('.') return '-'.join(map('.'.join, (start, end[next((i for i in range(4) if start[i] != end[i]), 3):]))) def strings_to_ints(ranges): # turn range strings into list of lists of ints return [map(ip2long, rng) for rng in map(expandrange, ranges)] def ints_to_strings(ranges): # turn lists of lists of ints into range strings return [compressrange(map(long2ip, rng)) for rng in ranges] def consolodate(ranges): # join overlapping ranges in a sorted iterable iranges = iter(ranges) startmin, startmax = next(iranges) for endmin, endmax in iranges: # leave out the '+ 1' if you want to join overlapping ranges # but not consecutive ranges. if endmin <= (startmax + 1): startmax = max(startmax, endmax) else: yield startmin, startmax startmin, startmax = endmin, endmax yield startmin, startmax def convert_consolodate(ranges): # convert a list of possibly overlapping ip range strings # to a sorted, consolodated list of non-overlapping ip range strings return list(ints_to_strings(consolodate(sorted(strings_to_ints(ranges))))) if __name__ == '__main__': ranges = ('1.1.1.1-7', '2.2.2.2-10', '3.3.3.3-3.3.3.3', '1.1.1.4-25', '2.2.2.4-6') print convert_consolodate(ranges) # prints ['1.1.1.1-25', '2.2.2.2-10', '3.3.3.3-3'] 
+5
source

I once encountered the same problem. The only difference was that I had to effectively support line segments in the list. This was for Monte Carlo simulation. And new randomly generated line segments had to be added to existing sorted and merged line segments.

I adapted the algorithm to your problem using lunixbochs answer to convert IP addresses to integers.

This solution allows you to add a new IP range to the existing list of already merged ranges (while other solutions rely on sorting the list of ranges to merge and do not allow you to add a new range to the already merged range list). This is done in add_range using the bisect module to find the place where you want to insert a new IP range, and then remove the redundant IP ranges and insert a new range with adjusted boundaries so that the new range covers all deleted ranges.

 import socket import struct import bisect def ip2long(ip): '''IP to integer''' packed = socket.inet_aton(ip) return struct.unpack("!L", packed)[0] def long2ip(n): '''integer to IP''' unpacked = struct.pack('!L', n) return socket.inet_ntoa(unpacked) def get_ips(s): '''Convert string IP interval to tuple with integer representations of boundary IPs '1.1.1.1-7' -> (a,b)''' s1,s2 = s.split('-') if s2.isdigit(): s2 = s1[:-1] + s2 return (ip2long(s1),ip2long(s2)) def add_range(iv,R): '''add new Range to already merged ranges inplace''' left,right = get_ips(R) #left,right are left and right boundaries of the Range respectively #If this is the very first Range just add it to the list if not iv: iv.append((left,right)) return #Searching the first interval with left_boundary < left range side p = bisect.bisect_right(iv, (left,right)) #place after the needed interval p -= 1 #calculating the number of interval basing on the position where the insertion is needed #Interval: |----X----| (delete) #Range: <--<--|----------| (extend) #Detect if the left Range side is inside the found interval if p >=0: #if p==-1 then there was no interval found if iv[p][1]>= right: #Detect if the Range is completely inside the interval return #drop the Range; I think it will be a very common case if iv[p][1] >= left-1: left = iv[p][0] #extending the left Range interval del iv[p] #deleting the interval from the interval list p -= 1 #correcting index to keep the invariant #Intervals: |----X----| |---X---| (delete) #Range: |-----------------------------| #Deleting all the intervals which are inside the Range interval while True: p += 1 if p >= len(iv) or iv[p][0] >= right or iv[p][1] > right: 'Stopping searching for the intervals which is inside the Range interval' #there are no more intervals or #the interval is to the right of the right Range side # it the next case (right Range side is inside the interval) break del iv[p] #delete the now redundant interval from the interval list p -= 1 #correcting index to keep the invariant #Interval: |--------X--------| (delete) #Range: |-----------|-->--> (extend) #Working the case when the right Range side is inside the interval if p < len(iv) and iv[p][0] <= right-1: #there is no condition for right interval side since #this case would have already been worked in the previous block right = iv[p][1] #extending the right Range side del iv[p] #delete the now redundant interval from the interval list #No p -= 1, so that p is no pointing to the beginning of the next interval #which is the position of insertion #Inserting the new interval to the list iv.insert(p, (left,right)) def merge_ranges(ranges): '''Merge the ranges''' iv = [] for R in ranges: add_range(iv,R) return ['-'.join((long2ip(left),long2ip(right))) for left,right in iv] ranges = ('1.1.1.1-7', '2.2.2.2-10', '3.3.3.3-3.3.3.3', '1.1.1.4-25', '2.2.2.4-6') print(merge_ranges(ranges)) 

Output:

 ['1.1.1.1-1.1.1.25', '2.2.2.2-2.2.2.10', '3.3.3.3-3.3.3.3'] 

I had a lot of fun programming! Thanks for this:)

+1
source

Convert ranges to pairs of numbers. These functions convert individual IP addresses to and from integer values.

 def ip2long(ip): packed = socket.inet_aton(ip) return struct.unpack("!L", packed)[0] def long2ip(n): unpacked = struct.pack('!L', n) return socket.inet_ntoa(unpacked) 

Now you can sort / combine the edges of each range as numbers, and then convert back to IP addresses to get a good idea. This question about merge time ranges has a good algorithm.


  • Divide the lines 1.1.1.1-1.1.1.2 and 1.1.1.1-2 into a pair of numbers. For the latter format, you can:

     x = '1.1.1.1-2' first, add = x.split('-') second = first.rsplit('.', 1)[0] + '.' + add pair = ip2long(first), ip2long(second) 
  • Combine overlap ranges using simple number comparisons.

  • Convert back to string representation (still accepts the latest format):

     first, second = pair first = long2ip(first) + '-' + long2ip(second).rsplit('.', 1)[1] 
+1
source

Unify the format of your ips, change the range to a couple of integers.

Now the task is much simpler - to "consolidate" the integer range. I believe that there are many existing efficient algorithms for this, below just my naive attempt:

 >>> orig_ranges = [(1,5), (7,12), (2,3), (13,13), (13,17)] # should result in (1,5), (7,12), (13,17) >>> temp_ranges = {} >>> for r in orig_ranges: temp_ranges.setdefault(r[0], []).append('+') temp_ranges.setdefault(r[1], []).append('-') >>> start_count = end_count = 0 >>> start = None >>> for key in temp_ranges: if start is None: start = key start_count += temp_ranges[key].count('+') end_count += temp_ranges[key].count('-') if start_count == end_count: print start, key start = None start_count = end_count = 0 1 5 7 12 13 17 

The general idea is as follows: after we put ranges on temp_ranges other (in temp_ranges dict), we can find new compiled ranges simply by counting the initial and final values โ€‹โ€‹of the original ranges; as soon as we got equality, we found a single range.

0
source

I had these lying in case you need em, using socket / struct is probably the best way to go, though

 def ip_str_to_int(address): """Convert IP address in form XXXX to an int. >>> ip_str_to_int('74.125.229.64') 1249764672 """ parts = address.split('.') parts.reverse() return sum(int(v) * 256 ** i for i, v in enumerate(parts)) def ip_int_to_str(address): """Convert IP address int into the form XXXX >>> ip_int_to_str(1249764672) '74.125.229.64' """ parts = [(address & 255 << 8 * i) >> 8 * i for i in range(4)] parts.reverse() return '.'.join(str(x) for x in parts) 
0
source

All Articles