Posix regular expression inanimate

Is there a way to use an unwanted regular expression in C that can be used in Perl? I tried a few things, but actually it does not work.

I am currently using this regex that matches the IP address and the corresponding HTTP request, but it is greedy, although I use * ?:

([0-9]{1,3}(\\.[0-9]{1,3}){3})(.*?)HTTP/1.1

In this example, it always matches the entire line:

 #include <regex.h> #include <stdio.h> int main() { int a, i; regex_t re; regmatch_t pm; char *mpages = "TEST 127.0.0.1 GET /test.php HTTP/1.1\" 404 525 \"-\" \"Mozilla/5.0 (Windows NT HTTP/1.1 TEST"; a = regcomp(&re, "([0-9]{1,3}(\\.[0-9]{1,3}){3})(.*?)HTTP/1.1", REG_EXTENDED); if(a!=0) printf(" -> Error: Invalid Regex"); a = regexec(&re, &mpages[0], 1, &pm, REG_EXTENDED); if(a==0) { for(i = pm.rm_so; i < pm.rm_eo; i++) printf("%c", mpages[i]); printf("\n"); } return 0; } 

$. / regtest

127.0.0.1 GET / test.php HTTP / 1.1 "404 525" - "" Mozilla / 5.0 (Windows NT HTTP / 1.1

+6
c regex posix non-greedy
source share
5 answers

No, there are no inanimate quantifiers in POSIX regular expressions. But there is a library that provides perl-like regular expressions for C: http://www.pcre.org/

+5
source share

As I said in a comment, use grep -E to run tests using POSIX regexes, so development time will be improved. In any case, it seems your problem is with the regex, not the missing function.

I don’t quite understand what you want from the request ... Assuming you just need an IP address, an HTTP verb and a resource, you can get the following regular expression.

 regcomp(&re, "\\b(.?[0-9])+\\s+(GET|POST|PUT)\\s+([^ ]+)", REG_EXTENDED); 

Keep in mind that several assumptions have been made. For example, this regular expression assumes that the IP address will be well formed, it also accepts a request with the HTTP verb GET, POST, PUT. Correctly adjust your needs.

0
source share

The brute force method of obtaining a regular expression to match the following occurrence of a word:

 "([^H]|H[^T]|HT[^T]|HTT[^P]|HTTP{^/]|HTTP/[^1]|HTTP/1[^.]|HTTP/1\\.[^1])*HTTP/1\\.1" 

if you cannot get more reasonable information about your match - what can you: HTTP requests -

 Request-Line = Method SP Request-URI SP HTTP-Version CRLF 

and not one of the nonterminals in the correct match with nested spaces. So:

 "[0-9]{1,3}(\\.[0-9]{1,3}){3} [^ ]* [^ ]* HTTP/1\\.1" 

as you allocate space to match the entire expression or return parsers to get the fragments.

0
source share
 a = regcomp(&re, "([0-9]{1,3}(\\.[0-9]{1,3}){3})(.*?)HTTP/1.1", REG_EXTENDED|REG_ENHANCED); 

Doesn't have this macro in the old days

 #if __MAC_OS_X_VERSION_MIN_REQUIRED >= __MAC_10_8 \ || __IPHONE_OS_VERSION_MIN_REQUIRED >= __IPHONE_6_0 #define REG_ENHANCED 0400 /* Additional (non-POSIX) features */ #endif 
0
source share

Your pm code should have an regmatch_t array, and in your case there should be at least 2-4 elements, depending on which () subexpressions you want to capture.

You have only one item. The first element pm[0] always gets any text that matches your RE. This is the one you get. This is pm[1] , which will receive the text of the first () subexpression (IP address) and pm[3] , which will receive the text that matches your term (.*?) .

But even in this case, as indicated above (Wumbley, WQ), the POSIX regular expression library may not support non-living quantifiers.

-one
source share

All Articles