Pull out the string parts and the number of string parts

Question

Pull out the string parts and the number of string parts

I may have lines that look something like this:

ABC DEF-123 456 789GH-IJK-0

And I'm trying to figure out a Regex that will group it by lines and numbers, for example:

 (ABC) (DEF-)(123) (456) (789)(GH-IJK-)(0)

My first thought was to use (\ D * | \ d *) as a pattern, but the numbers are not returned

+4

regex

Jimmy mattsson Jun 14 '11 at 11:51

source share

3 answers

Use + instead of * for alternatives:

 (\D+|\d+)

+2

lnmx Jun 14 '11 at 12:09

source share

It seems to work, but rather ugly (backward oblique plague). Instead of doing one regular expression, divide it into two, one for processing numbers and one for characters.

 $ sed 's/\([a-zA-Z-]\+\)/(\1)/g ; s/\([0-9]\+\)/(\1)/g' input (BC) (DEF-)(123) (456) (789)(GH-IJK-)(0)

0

Fredrik pihl Jun 14 '11 at 12:25

source share

Andrew White · Accepted Answer · 2011-06-14T11:57:04+0000

How about using internal non-exciting subgroups ...

 ((?:\D+)|(?:\d+))

Example output from perl ...

 cat input | perl -ane 'chomp; print "looking at $_\n"; while(/((?:\D+)|(?:\d+))/g) {print "Found $1\n";}' looking at BC Found BC looking at DEF-123 Found DEF- Found 123 looking at 456 Found 456 looking at 789GH-IJK-0 Found 789 Found GH-IJK- Found 0

Pull out the string parts and the number of string parts

More articles: