Python - change backlink. It can be done?

New to Python, please forgive my ignorance. I am trying to change reverse link strings in a regex.

Example:

>>>a_string 'fsa fad fdsa dsafasdf usa USA usa fdas adfs.f fdsa f.afda' >>> re.sub(r'(?<=\s)(([a-zA-Z]\.)+[a-zA-Z]\.{0,1})(?=\s)', '<acronym>'+re.sub(r'\.',r'',(r'\1').upper())+'</acronym>', a_string) 'fsa fad fdsa dsafasdf <acronym>usa</acronym> <acronym>USA</acronym> <acronym>usa</acronym> fdas adfs.f fdsa f.afda' 

Instead of exiting, I want:

 'fsa fad fdsa dsafasdf <acronym>USA</acronym> <acronym>USA</acronym> <acronym>USA</acronym> fdas adfs.f fdsa f.afda' 

Thank you for your help.

+4
source share
3 answers

From the docs :

If repl is a function, it is called for each non-overlapping occurrence of the pattern. The function takes one argument of the mapping object and returns a replacement string. For instance:

And look at an example contained in related documents.

+2
source

As Ignacio Vasquez-Abrams suggested, you can solve your problems by passing the re.sub() function you are re.sub() . I thought a code example would explain this best, so here you go:

 import re s = "fsa fad fdsa dsafasdf usa USA usa fdas adfs.f fdsa f.afda" s_pat = r'(?<=\s)(([a-zA-Z]\.)+[a-zA-Z]\.{0,1})(?=\s)' pat = re.compile(s_pat) def add_acronym_tag(match_object): s = match_object.group(0) s = s.replace('.', '').upper() return "<acronym>%s</acronym>" % s s = re.sub(pat, add_acronym_tag, s) print s 

The above prints:

 fsa fad fdsa dsafasdf <acronym>USA</acronym> <acronym>USA</acronym> <acronym>USA</acronym> fdas adfs.f fdsa f.afda 

This way you are not actually changing the backlink because the lines are immutable. But it’s just as good: you can write a function for any processing you need, and then return whatever you want, and this is what re.sub() will insert into the final result.

Please note: you can use regular expressions inside your function; I just used the .replace() string method, because you just want to get rid of one character, and for this you do not need the full power of regular expressions.

+2
source

"Changing the back link" needs to be rephrased, as you seem to be confusing.

A replacement backward link is a special combination of characters within a string that tells the regex engine to refer to certain specific values ​​of the capture group (also called sub-matches) obtained during the matching operation.

When you use r'\1'.upper() , you try to make the string \1 capitalized, and since \1 does not have capital letters, you get \1 as a result, and this \1 - unchanged - is applied as (part ) string replacement pattern.

This is why you cannot change the value of a capture group in this way.

This is why you should use the called object as a replacement argument (see Ignacio's answer ): you need to pass the matching object to re.sub in order to be able to manipulate nested matches (although you can of course replace one or two characters in backrefence, say , r'\g<12>'.replace('2','1') in order to "confuse" the \g<11> backlink, but this operation does not make much sense).

+1
source

All Articles