Separate by suffix with Python regex

I want to separate lines only with suffixes. For example, I would like to be able to split the dord word into [dor,wor] .

I though \wd will look for words ending with d . However, this does not give the expected results.

 import re re.split(r'\wd',"dord word") ['do', ' wo', ''] 

How can I separate by suffixes?

+5
source share
4 answers
 x='dord word' import re print re.split(r"d\b",x) 

or

 print [i for i in re.split(r"d\b",x) if i] #if you dont want null strings. 

Try it.

+4
source

As a better way, you can use re.findall and use r'\b(\w+)d\b' as your regular expression to find the rest of the word before d :

 >>> re.findall(r'\b(\w+)d\b',s) ['dor', 'wor'] 
+3
source

Since \w also captures numbers and underscores, I would define a word consisting of simple letters with the character class [a-zA-Z] :

 print [x.group(1) for x in re.finditer(r"\b([a-zA-Z]+)d\b","dord word")] 

Watch the demo

+2
source

If you're wondering why your original approach doesn't work,

 re.split(r'\wd',"dord word") 

It finds all instances of the letter / number / underscore before "d" and breaks down into what it finds. So he did this:

do [rd] wo [rd]

and separate the lines in brackets by deleting them.

Also note that this can be shared in the middle of words, therefore:

 re.split(r'\wd', "said tendentious") 

would divide the second word into two.

+1
source

All Articles