Separate by suffix with Python regex

Question

I want to separate lines only with suffixes. For example, I would like to be able to split the dord word into [dor,wor] .

I though \wd will look for words ending with d . However, this does not give the expected results.

 import re re.split(r'\wd',"dord word") ['do', ' wo', '']

How can I separate by suffixes?

+5

kilojoules Jul 12 '15 at 19:50

4 answers

As a better way, you can use re.findall and use r'\b(\w+)d\b' as your regular expression to find the rest of the word before d :

 >>> re.findall(r'\b(\w+)d\b',s) ['dor', 'wor']

+3

Kasramvd Jul 12 '15 at 19:53

Since \w also captures numbers and underscores, I would define a word consisting of simple letters with the character class [a-zA-Z] :

 print [x.group(1) for x in re.finditer(r"\b([a-zA-Z]+)d\b","dord word")]

Watch the demo

+2

Wiktor stribiżew Jul 12 '15 at 19:56

If you're wondering why your original approach doesn't work,

 re.split(r'\wd',"dord word")

It finds all instances of the letter / number / underscore before "d" and breaks down into what it finds. So he did this:

do [rd] wo [rd]

and separate the lines in brackets by deleting them.

Also note that this can be shared in the middle of words, therefore:

 re.split(r'\wd', "said tendentious")

would divide the second word into two.

+1

twasbrillig Jul 12 '15 at 21:15

vks · Accepted Answer · 2015-07-12T19:54:08+0000

 x='dord word' import re print re.split(r"d\b",x)

or

 print [i for i in re.split(r"d\b",x) if i] #if you dont want null strings.

Try it.