Pull out the string parts and the number of string parts

I may have lines that look something like this:

ABC DEF-123 456 789GH-IJK-0 

And I'm trying to figure out a Regex that will group it by lines and numbers, for example:

 (ABC) (DEF-)(123) (456) (789)(GH-IJK-)(0) 

My first thought was to use (\ D * | \ d *) as a pattern, but the numbers are not returned

+4
source share
3 answers

How about using internal non-exciting subgroups ...

 ((?:\D+)|(?:\d+)) 

Example output from perl ...

 cat input | perl -ane 'chomp; print "looking at $_\n"; while(/((?:\D+)|(?:\d+))/g) {print "Found $1\n";}' looking at BC Found BC looking at DEF-123 Found DEF- Found 123 looking at 456 Found 456 looking at 789GH-IJK-0 Found 789 Found GH-IJK- Found 0 
+4
source

Use + instead of * for alternatives:

 (\D+|\d+) 
+2
source

It seems to work, but rather ugly (backward oblique plague). Instead of doing one regular expression, divide it into two, one for processing numbers and one for characters.

 $ sed 's/\([a-zA-Z-]\+\)/(\1)/g ; s/\([0-9]\+\)/(\1)/g' input (BC) (DEF-)(123) (456) (789)(GH-IJK-)(0) 
0
source

All Articles