Replace all occurrences of certain words

Suppose I have the following sentence:

bean likes to sell his beans 

and I want to replace all occurrences of certain words with other words. For example, from bean to robert and beans to cars .

I can't just use str.replace , because in this case it will change the beans value to roberts .

 >>> "bean likes to sell his beans".replace("bean","robert") 'robert likes to sell his roberts' 

I need to change only whole words, not the occurrence of this word in another word. I think I can achieve this using regular expressions, but I don’t know how to do it correctly.

+8
python regex
source share
4 answers

If you use regular expression, you can specify word boundaries with \b :

 import re sentence = 'bean likes to sell his beans' sentence = re.sub(r'\bbean\b', 'robert', sentence) # 'robert likes to sell his beans' 

Here, 'beans' does not change (to 'roberts'), because the 's' at the end is not the boundary between the words: \b matches the empty string, but only at the beginning or end of the word.

Second replacement of completeness:

 sentence = re.sub(r'\bbeans\b', 'cars', sentence) # 'robert likes to sell his cars' 
+15
source share

If you replace each word one at a time, you can replace the words several times (and not get what you want). To avoid this, you can use a function or lambda:

 d = {'bean':'robert', 'beans':'cars'} str_in = 'bean likes to sell his beans' str_out = re.sub(r'\b(\w+)\b', lambda m:d.get(m.group(1), m.group(1)), str_in) 

Thus, as soon as the bean is replaced by robert , it will not be changed again (even if robert also in your word entry list).

As suggested by georg , I edited this answer using dict.get(key, default_value) . Alternative solution (also suggested by georg ):

 str_out = re.sub(r'\b(%s)\b' % '|'.join(d.keys()), lambda m:d.get(m.group(1), m.group(1)), str_in) 
+3
source share
 "bean likes to sell his beans".replace("beans", "cars").replace("bean", "robert") 

Replaces all instances of "beans" with "cars" and "bean" with "robert". This works because .replace() returns a modified instance of the source string. So you can think about it in stages. It essentially works as follows:

  >>> first_string = "bean likes to sell his beans" >>> second_string = first_string.replace("beans", "cars") >>> third_string = second_string.replace("bean", "robert") >>> print(first_string, second_string, third_string) ('bean likes to sell his beans', 'bean likes to sell his cars', 'robert likes to sell his cars') 
-one
source share

I know that it was a long time ago, but does it look much more elegant?

 reduce(lambda x,y : re.sub('\\b('+y[0]+')\\b',y[1],x) ,[("bean","robert"),("beans","cars")],"bean likes to sell his beans") 
-one
source share

All Articles