Python string splitting

Question

Python string splitting

I have an input line like this: a1b2c30d40 , and I want to tokenize the line: a, 1, b, 2, c, 30, d, 40 .

I know that I can read each character one by one and follow the previous character to determine whether I should tokenize it or not (2 digits in a line means that this does not mean their tokenization), but is there a more pythonic way do it?

+7

python string split

Hery Jan 30 '11 at 16:00

source share

1 answer

Cat plus plus · Accepted Answer · 2011-01-30T16:04:43+0000

 >>> re.split(r'(\d+)', 'a1b2c30d40') ['a', '1', 'b', '2', 'c', '30', 'd', '40', '']

In the template: as stated in the comment, \d means "match one digit", + is a modifier that means "match one or more", therefore \d+ means "match as many digits as possible", This is placed in a group () , therefore, the entire template in the context of re.split means "split this line using as many digits as possible as a separator, additionally capturing the agreed separators in the result." If you omit the group, you will get ['a', 'b', 'c', 'd', ''] .

Python string splitting

More articles: