Python Regular Expression Matching Last Word

Question

Python Regular Expression Matching Last Word

I have the following problem. I am looking to find all the words in a line that usually looks like this. HelloWorldToYou Note: each word is capitalized as the beginning, followed by the next word, etc. I am looking to create a list of words from it. So the final expected result is a list that looks like

['Hello','World','To','You']

In Python, I used the following

mystr = 'HelloWorldToYou'
pat = re.compile(r'([A-Z](.*?))(?=[A-Z]+)')
[x[0] for x in pat.findall(mystr)]
['Hello', 'World', 'To']

However, I can not fix the last word "You." Is there any way to handle this? thanks in advance

+4

python list regex

broccoli Jun 22 '15 at 17:36

source share

1 answer

Wiktor Stribiżew · Accepted Answer · 2015-06-22T17:39:00+0000

Use alternation with $:

import re
mystr = 'HelloWorldToYou'
pat = re.compile(r'([A-Z][a-z]*)')
# or your version with `.*?`: pat = re.compile(r'([A-Z].*?)(?=[A-Z]+|$)')
print pat.findall(mystr)

Watch the IDEONE Demo

Conclusion:

['Hello', 'World', 'To', 'You']

Regex explanation :

([A-Z][a-z]*) - ,
- [A-Z] ,
- [a-z]* -
  -OR-
- .*? - ,

, [a-z]*, .*?, :

(?=[A-Z]+|$) - ( + ), ($).

, finditer:

import re
mystr = 'HelloWorldToYou'
pat = re.compile(r'[A-Z][a-z]*')
print [x.group() for x in pat.finditer(mystr)]

Python Regular Expression Matching Last Word

More articles: