Perl regex expression

I have inherited a perl script that extracts data from some files. The whole script works fine, but recently, some engineers put more than one number for a specific place, which usually takes one number, so the output does not show everything that is expected.

Input Example:

CRXXXX: "Then some text"

CRs XXXX, XXXX, XX, XXX

CRXXX "Some Text"

This is currently a regular expression expression that I pull after CR, but if the second line of sample input is given, it prints "s XXXX, XXXX, XX, XXX" instead of the desired "XXXX XXXX XX XXX"

I am very new to perl and struggling to figure out how to change this regex to work on all inputs.

  $temp_comment =~ s/\s[cC][rR][-\s:;]*([\d])/\n$1/mg; 

Thanks in advance!

Brock

+4
source share
4 answers

For example data, for example:

 my $temp_comment = 'CR1234: "Then some text" CRs 2345, 3456, 45, 567 CR678 "Some Text"'; 

to try:

 $temp_comment =~ s/(,)|[^\d\n]+/$1?' ':''/semg; 

or if you want to stay close to string patterns:

 $temp_comment =~ s/ ^ # multi-line mode, line start \s* # leading blanks? CR # CR tag \D* # non-number stuff ( # start capture group (?:\d+ [,\s]*)+ # find (number, comma, space) groups ) # end capture group \D* # skip remaining non-number stuff $ # multi-line mode, line end /$1/mxg; # set multi-line mode + regex comments "x" 

but you will need to remove the commas in the group of numbers in the next step.

 $temp_comment =~ tr/,//d; # remove commas in the whole string 

or

 $temp_comment =~ s/(?<=\d),(?=\s\d)//g; # remove commas between numbers '11, 22' 

For a "single step" you should use the /e modifier:

 $temp_comment =~ s{ ^ # line start \s* # leading blanks? CR # CR tag \D* # non-number stuff ((?:\d+ [,\s]*)+) # single or group of numbers \D* # non number stuff $ # line end } {do{(local$_=$1)=~y/,//d;$_}}mxeg; 

This, according to the above data, will lead to:

 1234 2345 3456 45 567 678 

But really, use , if possible, a simpler two-step approach . The last regex can be a nightmare for your successors.

+2
source

You might be better off doing this in two steps:

1) Create your regex

s/\s[cC][rR][-\s:;]*([\d\ ]+)/\n$1/mg (pay attention to the new way to capture all numbers, you only fix the first number above)

2) Then just separate the commas in the line with find / replace.

+1
source
 my ($v) = /CR[s ]*((?:\d+[\s,]*)*)/ig; $v =~ s/,//g; print $v,"\n"; 
+1
source

Perhaps the following will be done for you:

 use Modern::Perl; say join ' ', (/(\d+)/g) for <DATA>; __DATA__ CR1234: "Then some text" CRs 1111, 2222, 33, 444 CR567 "Some Text" 

Conclusion:

 1234 1111 2222 33 444 567 

Hope this helps!

+1
source

All Articles