Python string splitting

I have an input line like this: a1b2c30d40 , and I want to tokenize the line: a, 1, b, 2, c, 30, d, 40 .

I know that I can read each character one by one and follow the previous character to determine whether I should tokenize it or not (2 digits in a line means that this does not mean their tokenization), but is there a more pythonic way do it?

+7
source share
1 answer
 >>> re.split(r'(\d+)', 'a1b2c30d40') ['a', '1', 'b', '2', 'c', '30', 'd', '40', ''] 

In the template: as stated in the comment, \d means "match one digit", + is a modifier that means "match one or more", therefore \d+ means "match as many digits as possible", This is placed in a group () , therefore, the entire template in the context of re.split means "split this line using as many digits as possible as a separator, additionally capturing the agreed separators in the result." If you omit the group, you will get ['a', 'b', 'c', 'd', ''] .

+13
source

All Articles