Regular Expression Transformation List

Question

Regular Expression Transformation List

I have a list that has elements in this form, the lines may change, but the formats remain similar:

["Radio0","Tether0","Serial0/0","Eth0/0","Eth0/1","Eth1/0","Eth1/1","vlanX","modem0","modem1","modem2","modem3","modem6"]

I would like to convert it to the list below. You can see that it will remove copies of the same occurrence of a string, such as Eth - with only one occurrence in a new list and convert the numbers in x and y to a more general one:

 ["RadioX","TetherX","SerialX/Y","EthX/Y","vlanX","modemX"]

I was messing around with another regex, and my method is pretty confusing, he will be interested in any elegant solutions you guys think about.

Here is some code for it that could be improved, as well as set does not preserve order, so it must also be improved:

 a = ["Radio0","Tether0","Serial0/0","Eth0/0","Eth0/1","Eth0/2","Eth1/0","vlanX","modem0","modem1","modem2","modem3","modem6"] c =[] for i in a: b = re.split("[0-9]", i) if "/" in i: c.append(b[0]+"X/Y") elif len(b) > 1: c.append(b[0]+"X") else: c.append(b) print set(c) set(['modemX', 'TetherX', 'RadioX', 'vlanX', 'SerialX/Y', 'EthX/Y'])

Possible enhancement when typing to maintain order:

 unique=[] [unique.append(item) for item in c if item not in unique] print unique ['RadioX', 'TetherX', 'SerialX/Y', 'EthX/Y', 'vlanX', 'modemX']

+7

python regex

Paul Aug 10 '16 at 13:56

source share

5 answers

The following code should be generic enough to allow up to three numbers per line, but you can just change the repl variable to allow more.

 import re elements = ["Radio0","Tether0","Serial0/0","Eth0/0","Eth0/1","Eth1/0","Eth1/1","vlanX","modem0","modem1","modem2","modem3","modem6"] repl = "XYZ" for i in range(len(repl)): elements = [re.sub("[0-9]",repl[i], element, 1) for element in elements] result = set(elements)

+2

David hoksza Aug 10 '16 at 15:01

source share

I used re.finditer to find and replace all numbers:

 def repl(string): #use regex to find all numbers numbers= re.finditer(r'\d+', string) #replace the numbers with letters. zip will stop when the sequence of #numbers OR letters runs out. for match, char in zip(numbers, 'XYZ'): #add more characters if necessary string= string[:match.start()] + char + string[match.end():] return string s= set() #set to keep track of duplicates while maintaining order result= [] for string in l: string= repl(string) if string in s: #ignore if duplicate continue #otherwise add to result list s.add(string) result.append(string)

It can replace up to three numbers using X , Y or Z can easily be changed to support more.

+1

Aran-fey Aug 10 '16 at 15:04

source share

You can go for:

 import re rx = r'\d+' incoming = ["Radio0","Tether0","Serial0/0","Eth0/0","Eth0/1","Eth1/0","Eth1/1","vlanX","modem0","modem1","modem2","modem3","modem6"] outgoing = [] for item in incoming: t = re.sub(rx, 'X', item) if t not in outgoing: outgoing.append(t) print(outgoing) # ['RadioX', 'TetherX', 'SerialX/X', 'EthX/X', 'vlanX', 'modemX']

Or (just another syntax example using powerful Python list comprehension):

 import re rx = re.compile(r'\d+') incoming = ["Radio0","Tether0","Serial0/0","Eth0/0","Eth0/1","Eth1/0","Eth1/1","vlanX","modem0","modem1","modem2","modem3","modem6"] def cleanitem(item): return rx.sub('X', item) outgoing = [] [outgoing.append(item) \ for item in (cleanitem(x) for x in incoming) \ if item not in outgoing] print(outgoing)

See a working demo at ideone.com .

+1

Jan Aug 10 '16 at 15:04

source share

 import re import functools lst = ["Radio0","Tether0","Serial0/0","Eth0/0","Eth0/1","Eth1/0","Eth1/1","vlanX","modem0","modem1","modem2","modem3","modem6"] def process_str(s, letters='XY'): return functools.reduce(lambda txt, letter: re.sub(r'\d+', letter, txt, 1), letters, s) r = set(map(process_str, lst)) print(r)

+1

Israel Unterman Aug 10 '16 at 16:05

source share

Bpl · Accepted Answer · 2016-08-10T14:55:15+0000

 import re def particular_case(string): return re.sub("\d+", "X", re.sub("\d+/\d+", "X/Y", w)) def generic_case(string, letters=['X', 'Y', 'Z']): len_letters = len(letters) list_matches = list(re.finditer(r'\d+', string)) result, last_index = "", 0 if len(list_matches) == 0: return string for index, match in enumerate(list_matches): result += string[last_index: match.start(0)] + letters[index % len_letters] last_index = match.end(0) return result if __name__ == "__main__": words = ["Radio0", "Tether0", "Serial0/0", "Eth0/0", "Eth0/1", "Eth1/0", "Eth1/1", "vlanX", "modem0", "modem1", "modem2", "modem3", "modem6"] result = [] result2 = [] for w in words: new_value = particular_case(w) if new_value not in result: result.append(new_value) new_value = generic_case(w) if new_value not in result2: result2.append(new_value) print result print result2

Regular Expression Transformation List

More articles: