Regex: matching 4-digit numbers in words

I have the text I'm looking for to pull out repeating sets of 4 digit numbers.

Example:

The first - 1234 2) The second - 2098 3) The third - 3213

Now I know that I can get the first set of numbers, simply using:

/\d{4}/ 

... return 1234

But how can I combine the second set of numbers, or the third, and so on ...?

edit: How do I return 2098 or 3213

+7
regex perl
source share
5 answers

You do not have a suitable answer to your question.

The solution is to use the /g modifier in your regular expression. In the context of the list, it will find all the numbers in your line at once, like this

 my $str = 'The first is 1234 2) The Second is 2098 3) The Third is 3213'; my @numbers = $str =~ /\b \d{4} \b/gx; print "@numbers\n"; 

Exit

 1234 2098 3213 

Or you can go through them using a scalar context in a while , like

 while ($str =~ /\b (\d{4}) \b/gx) { my $number = $1; print $number, "\n"; } 

Exit

 1234 2098 3213 

I added \b patterns to the regular expression so that it matches only four-digit integers and, for example, did not find 1234 in 1234567 . The /x modifier allows me to add spaces to make the pattern more understandable.

+11
source share

See http://perldoc.perl.org/perlre.html for a discussion of using the 'g' modifier, which will cause your regular expression to match all events of its template, not just the first one.

+1
source share

If you need a template that finds $n 'th 4-digit group, this works:

 $pat = "^(?:.*?\\b(\\d{4})\\b){$n}"; if ($s =~ /$pat/) { print "Found $1\n"; } else { print "Not found\n"; } 

I did this by building a string template, because I could not get the variable interpolated into the quantifier {$n} .

This pattern finds 4-digit groups that are on word boundaries (tests \b ); I do not know if this meets your requirements. Used in the template .*? to ensure the maximum possible number of characters between each four-digit group. The pattern is matched $n times, and the capture group $1 set to what was in the last iteration, i.e. $n 'th.

EDIT: When I just tried it again, it seemed like interpolating $n in the quantifier was just fine. I do not know what I did differently, that the last time it did not work. Maybe this will work:

 if ($s =~ /^(?:.*?\b(\d{4}\b){$n}/) { ... 

If not, see amon's comment on qr// .

+1
source share

If the regular expression is only matched once, then match all three in one regular expression and extract them using the matched groups:

 ^.*\b(\d{4})\b.*\b(\d{4})\b.*\b(\d{4})\b.*$ 

Three 4-digit numbers will be written to group 1. 2 and 3.

0
source share

Ajb's answer with "gx" is the best. If you know you will have three digits, this straighforward line does the trick:

 my $str = 'The first is 1234 2) The Second is 2098 3) The Third is 3213'; my ($num1, $num2, $num3) = $str =~ /\b \d{4} \b/gx; print "$num1, $num2, $num3\n"; 
0
source share

All Articles