How to replace token pairs in a string?

New to python, competent in several languages, but not able to see a "dull" way to do the following. I'm sure he screams about regex, but any solution I can come up with (using regex groups and what not) gets really crazy.

So, I have a line with html-like tags that I want to replace with actual html tags.

For instance:

Hello, my name is /bJane/b. 

It should become:

 Hello, my name is <b>Jane</b>. 

It can be combined with [i] talic and [u] nderline:

 /iHello/i, my /uname/u is /b/i/uJane/b/i/u. 

It should become:

 <i>Hello</i>, my <u>name</u> is <b><i><u>Jane</b></i></u>. 

Obviously, the direct str.replace will not work, because every second token must be exceeded in advance.

For clarity, if the markers are combos, it always first opens, first closes.

Many thanks!

PS: Before anyone starts to worry, I know that this kind of thing needs to be done with CSS, blah, blah, blah, but I did not write software, I just change its output!
+6
python regex token
source share
3 answers

Perhaps something like this might help:

 import re def text2html(text): """ Convert a text in a certain format to html. Examples: >>> text2html('Hello, my name is /bJane/b') 'Hello, my name is <b>Jane</b>' >>> text2html('/iHello/i, my /uname/u is /b/i/uJane/u/i/b') '<i>Hello</i>, my <u>name</u> is <b><i><u>Jane</u></i></b>' """ elem = [] def to_tag(match_obj): match = match_obj.group(0) if match in elem: elem.pop(elem.index(match)) return "</{0}>".format(match[1]) else: elem.append(match) return "<{0}>".format(match[1]) return re.sub(r'/.', to_tag, text) if __name__ == "__main__": import doctest doctest.testmod() 
+7
source share

with sed:

 s/\/([biu])([^/]\+)\/\1/<\1>\2<\/\1>/g 
0
source share

A very simple solution would be to break the string using the original '/ b' tag and reattach the substring array to the new destination tag '' as follows:

 s = "Hello, my name is /bJane/b." '<b>'.join(s.split('/b')) print s 'Hello, my name is <b>Jane<b>.' 
0
source share

All Articles