Regular expressions matching complexity

My current regex is:

([\d]*)([^\d]*[\d][az]*-[\d]*)([\d][az?])(.?) 

So, I'm trying to get the regular expression to match a line based on: a count, which can be any number of numbers from 0 to 1 million, and then a number and then a letter, then any number for numbers followed by the same number, and sometimes a letter, and sometimes a letter. example of strings that must match:

 1921-1220104081741b 192123212a-1220234104081742ab 

an example of what it should return based on above (these are 2 examples, it should not read both lines.)

 (192) (1-122010408174) (1) (b) (19212321) (2a-122023410408174) (2a) (b) 

The current current regular expression works with the second, but it returns (1b) in the first when I would like it to return (1) (b), but also return (2a) in case of the second or case:

 1926h-1220104081746h Should Return: (192) (6h-122010408174) (6h) 

Not sure if 100%, if possible, meaning I'm pretty new to regex. For reference, I am doing this in excel-vba if there is another way to make it easier.

+7
vba excel-vba regex excel
source share
2 answers

You can capture the character (s) before the dash character, and then return the link that matches.

In the expression below, \3 will match what was matched by the third capture group:

 (\d*)((\d[az]*)-\d*)(\3)([az])? 

Example here

enter image description here

Exit after combining capture groups:

 1921-1220104081741b (192) (1-122010408174) (1) (b) 
 192123212a-1220234104081742ab (19212321) (2a-122023410408174) (2a) (b) 
 1926h-1220104081746h (192) (6h-122010408174) (6h) 

Example:

Ignore JS. Here is the result after merging capture groups:

 var strings = ['1921-1220104081741b', '192123212a-1220234104081742ab', '1926h-1220104081746h'], exp = /(\d*)((\d[az]*)-\d*)(\3)([az])?/; strings.forEach(function(str) { var m = str.match(exp); snippet.log(str); snippet.log('(' + m[1] + ') ('+ m[2] + ') (' + m[4] + ') (' + (m[5]||'') + ')'); snippet.log('---'); }); 
 <script src="http://tjcrowder.imtqy.com/simple-snippets-console/snippet.js"></script> 
+4
source share

I think that what you say, β€œfollowed by the same number”, is that the fragment right in front of the dash is repeated as your third capture group. I would suggest implementing this by splitting the second capture group, and then using the backlink:

 ([\d]*)([\d][az]*)-([\d]*)(\2)(.?) 

For your three examples:

 1921-1220104081741b 192123212a-1220234104081742ab 1926h-1220104081746h 

This leads to:

 (192) (1) - (122010408174) (1) (b) (19212321) (2a) - (122023410408174) (2a) (b) (192) (6h) - (122010408174) (6h) () 

... and you can join the two middle groups together to get the hyphen term you want.

+1
source share

All Articles