Regex template for HttpLog format analysis

I am looking for a set of regular expression patterns for a string in HttpLogFormat . The log is generated by haproxy . The following is an example string in this format.

Feb 6 12:14:14 localhost haproxy[14389]: 10.0.1.2:33317 [06/Feb/2009:12:14:14.655] http-in static/srv1 10/0/30/69/109 200 2750 - - ---- 1/1/1/1/0 0/0 {1wt.eu} {} "GET /index.html HTTP/1.1" 

A format explanation is available at HttpLogFormat . Any help is appreciated.

I am trying to get the individual pieces of information included on this line. Here are the fields:

  • process_name '[' pid ']:'
  • client_ip ':' client_port
  • '[' accept_date ']'
  • frontend_name
  • backend_name '/' server_name
  • Tq '/' Tw '/' Tc '/' Tr '/' Tt *
  • status_code
  • bytes_read
  • captured_request_cookie
  • captured_response_cookie
  • termination_state
  • actconn '/' feconn '/' beconn '/' srv_conn '/' repeats
  • srv_queue '/' backend_queue
  • '{' capture_request_headers * '}'
  • '{' capture_response_headers * '}'
  • '"' http_request '"'
+7
regex logging haproxy
source share
5 answers

Regex:

 ^(\w+ \d+ \S+) (\S+) (\S+)\[(\d+)\]: (\S+):(\d+) \[(\S+)\] (\S+) (\S+)/(\S+) (\S+) (\S+) (\S+) *(\S+) (\S+) (\S+) (\S+) (\S+) \{([^}]*)\} \{([^}]*)\} "(\S+) ([^"]+) (\S+)" *$ 

Results:

 Group 1: Feb 6 12:14:14 Group 2: localhost Group 3: haproxy Group 4: 14389 Group 5: 10.0.1.2 Group 6: 33317 Group 7: 06/Feb/2009:12:14:14.655 Group 8: http-in Group 9: static Group 10: srv1 Group 11: 10/0/30/69/109 Group 12: 200 Group 13: 2750 Group 14: - Group 15: - Group 16: ---- Group 17: 1/1/1/1/0 Group 18: 0/0 Group 19: 1wt.eu Group 20: Group 21: GET Group 22: /index.html Group 23: HTTP/1.1 

I use RegexBuddy to compose complex regular expressions.

+4
source

Use your own danger .

Does this assume that all fields return something , except for those that you marked with asterisks (is that what an asterisk means)? There are also obvious cases of failure, such as nested brackets of any type, but if the logger prints reasonably reasonable messages, then I think you'll be fine ...

Of course, even I personally would not want to support this, but there you have it. Perhaps you should consider writing a regular analyzer instead, if possible.

Change This is marked as CW, since it looks more like the question “I wonder how it will turn out” than anything else. For quick reference, this is what I ended up in rubular:

 ^[^[]+\s+(\w+)\[(\d+)\]:([^:]+):(\d+)\s+\[([^\]]+)\]\s+[^\s]+\s+(\w+)\/(\w+)\s+(\d+)\/(\d+)\/(\d+)\/(\d+)\/(\d*)\s+(\d+)\s+(\d+)\s+([^\s]+)\s+([^\s]+)\s+([^\s]+)\s(\d+)\/(\d+)\/(\d+)\/(\d+)\/(\d+)\s+(\d+)\/(\d+)\s+\{([^}]*)\}\s\{([^}]*)\}\s+\"([^"]+)\"$ 

My first programming language was Perl, and even I agree to admit that I was scared of it.

+2
source

It looks like a very complex string to match. I would recommend using a tool like Expresso . Start with the line you are trying to match, and start replacing parts of it with Regex notation.

To capture individual parts, use parentheses to group.

Another option is to create a regular expression for each part that you are trying to capture.

+1
source

Why are you trying to match the line exactly? If you are looking for specific fields in it, it is better to specify which ones and extract them. If you want to run statisticts in haproxy logs, you should take a look at the halog tool in the contrib directory in the sources. Take it from version 1.4.9, it even knows how to sort URLs by response time.

But no matter what you want to do with these lines, the regex will probably always be the slowest and most difficult solution.

+1
source

I don't think regex is your best option here ... however, if this is your only option ...

Try looking at these options instead. https://serverfault.com/q/62687/438

0
source

All Articles