Regex not working in C

I use regex when I use it in the shell, but not inside the C program.

Any thoughts please?

echo "abc: 1234567890@werty.wer.sdfg.net " | grep -E "(\babc\b|\bdef\b):[0-9]{10}@([A-Za-z0-9].*)" //shell reti = regcomp(&regex,"(\babc\b|\bdef\b):[0-9]{10}@([A-Za-z0-9].*)", 0); //c program 
+6
source share
2 answers

grep -E uses some extended ERE syntax, meaning that the quantifier brackets {n,m} (and also ( and ) ) should not be escaped (not in the case of the BRE regular expression).

You need to pass the REG_EXTENDED flag to regcomp , and also, since you cannot use the word boundary, replace the first \b with (^|[^[:alnum:]_]) "equivalent". You do not need to stop \b , since the template has::

 const char *str_regex = "(^|[^[:alnum:]_])(abc|def):[0-9]{10}@([A-Za-z0-9].*)"; 

The part (^|[^[:alnum:]_]) matches either the beginning of the line ( ^ ) or ( | ) a char, except for alphanumeric or underscore.

Full C demo :

 #include <stdio.h> #include <stdlib.h> #include <regex.h> int main (void) { int match; int err; regex_t preg; regmatch_t pmatch[4]; size_t nmatch = 4; const char *str_request = "abc: 1234567890@werty.wer.sdfg.net "; const char *str_regex = "(^|[^[:alnum:]_])(abc|def):[0-9]{10}@([A-Za-z0-9].*)"; err = regcomp(&preg, str_regex, REG_EXTENDED); if (err == 0) { match = regexec(&preg, str_request, nmatch, pmatch, 0); nmatch = preg.re_nsub; regfree(&preg); if (match == 0) { printf("\"%.*s\"\n", pmatch[2].rm_eo - pmatch[2].rm_so, &str_request[pmatch[2].rm_so]); printf("\"%.*s\"\n", pmatch[3].rm_eo - pmatch[3].rm_so, &str_request[pmatch[3].rm_so]); } else if (match == REG_NOMATCH) { printf("unmatch\n"); } } return 0; } 
+3
source

Word Boundary Reference

General information
Posix

From the above links, it appears that POSIX supports its own word boundary construct. Note that these constructions [[:<:]] , [[:>:]] are not classes.

Given this and using ERE as opposed to BRE, you have to do this -

reti = regcomp(®ex,"[[:<:]](abc|def)[[:>:]]:[0-9]{10}@([A-Za-z0-9].*)", REG_EXTENDED);

or, since between [cf] and : is the natural boundary of a word, it can be reduced to

reti = regcomp(®ex,"[[:<:]](abc|def):[0-9]{10}@([A-Za-z0-9].*)", REG_EXTENDED);

I have not tested this, but it probably works.
And given that it is actually unclear what this does internally, it might be better to stick to this syntax.

Some engines, such as Boost, which have a POSIX parameter, set the syntax to \< and \>

+1
source

All Articles