How to replace token pairs in a string?

Question

How to replace token pairs in a string?

New to python, competent in several languages, but not able to see a "dull" way to do the following. I'm sure he screams about regex, but any solution I can come up with (using regex groups and what not) gets really crazy.

So, I have a line with html-like tags that I want to replace with actual html tags.

For instance:

Hello, my name is /bJane/b.

It should become:

 Hello, my name is <b>Jane</b>.

It can be combined with [i] talic and [u] nderline:

 /iHello/i, my /uname/u is /b/i/uJane/b/i/u.

It should become:

 <i>Hello</i>, my <u>name</u> is <b><i><u>Jane</b></i></u>.

Obviously, the direct str.replace will not work, because every second token must be exceeded in advance.

For clarity, if the markers are combos, it always first opens, first closes.

Many thanks!

PS: Before anyone starts to worry, I know that this kind of thing needs to be done with CSS, blah, blah, blah, but I did not write software, I just change its output!

+6

python regex token

Bridgey Mar 16 '11 at 20:45

source share

3 answers

with sed:

 s/\/([biu])([^/]\+)\/\1/<\1>\2<\/\1>/g

0

Phil h Mar 16 '11 at 20:50

source share

A very simple solution would be to break the string using the original '/ b' tag and reattach the substring array to the new destination tag '' as follows:

 s = "Hello, my name is /bJane/b." '<b>'.join(s.split('/b')) print s 'Hello, my name is <b>Jane<b>.'

0

correndo Mar 16 '11 at 23:32

source share

mouad · Accepted Answer · 2011-03-16T21:03:30+0000

Perhaps something like this might help:

 import re def text2html(text): """ Convert a text in a certain format to html. Examples: >>> text2html('Hello, my name is /bJane/b') 'Hello, my name is <b>Jane</b>' >>> text2html('/iHello/i, my /uname/u is /b/i/uJane/u/i/b') '<i>Hello</i>, my <u>name</u> is <b><i><u>Jane</u></i></b>' """ elem = [] def to_tag(match_obj): match = match_obj.group(0) if match in elem: elem.pop(elem.index(match)) return "</{0}>".format(match[1]) else: elem.append(match) return "<{0}>".format(match[1]) return re.sub(r'/.', to_tag, text) if __name__ == "__main__": import doctest doctest.testmod()

How to replace token pairs in a string?

More articles: