Perl Regular Expression for Java StackTrace Keyword Search

I need grep fulltattrace from a log file for a keyword.

This code works fine, but slows down on large files (larger than the file slower). I think the best way to improve the regular expression for searching a keyword, but I could not do it.


#!/usr/bin/perl use strict; use warnings; my $regexp; my $stacktrace; undef $/; $regexp = shift; $regexp = quotemeta($regexp); while (<>) { while ( $_ =~ /(?<LEVEL>^[E|W|D|I])\s (?<TIMESTAMP>\d{6}\s\d{6}\.\d{3})\s (?<THREAD>.*?)\/ (?<CLASS>.*?)\s-\s (?<MESSAGE>.*?[\r|\n](?=^[[E|W|D|I]\s\d{6}\s\d{6}\.\d{3}]?))/gsmx ) { $stacktrace = $&; if ( $+{MESSAGE} =~ /$regexp/ ) { print "$stacktrace"; } } } 

Usage: ./grep_log4j.pl <pattern> <file>

Example: ./grep_log4j.pl Exception sample.log

I think the problem is $stacktrace = $&; , because if you delete this line and just print all the relevant lines, the script works quickly. Script version to print all matches:

 #!/usr/bin/perl use strict; use warnings; undef $/; while (<>) { while ( $_ =~ /(?<LEVEL>^[E|W|D|I])\s (?<TIMESTAMP>\d{6}\s\d{6}\.\d{3})\s (?<THREAD>.*?)\/ (?<CLASS>.*?)\s-\s (?<MESSAGE>.*?[\r|\n](?=^[[E|W|D|I]\s\d{6}\s\d{6}\.\d{3}]?))/gsmx ) { print_result(); } } sub print_result { print "LEVEL: $+{LEVEL}\n"; print "TIMESTAMP: $+{TIMESTAMP}\n"; print "THREAD: $+{THREAD}\n"; print "CLASS: $+{CLASS}\n"; print "MESSAGE: $+{MESSAGE}\n"; } 

Usage: ./grep_log4j.pl <file>

Example: ./grep_log4j.pl sample.log

Figure Lo4j: %-1p %d %t/%c{1} - %m%n

Example log file:

 I 111012 141506.000 thread/class - Received message: something E 111012 141606.000 thread/class - Failed handling mobile request java.lang.NullPointerException at javax.servlet.http.HttpServlet.service(HttpServlet.java:710) at java.lang.Thread.run(Thread.java:619) W 111012 141706.000 thread/class - Received message: something E 111012 141806.000 thread/class - Failed with Exception java.lang.NullPointerException at javax.servlet.http.HttpServlet.service(HttpServlet.java:710) at java.lang.Thread.run(Thread.java:619) D 111012 141906.000 thread/class - Received message: something S 111012 142006.000 thread/class - Received message: something I 111012 142106.000 thread/class - Received message: something I 111013 142206.000 thread/class - Metrics:0/1 

You can find my regex at http://gskinner.com/RegExr/ with the log4j keyword:

+4
source share
2 answers

You're using:

 $/ = undef; 

This causes perl to read the entire file in memory.

I would process this file line by line as follows (assuming the stack trace is related to the message above the trace):

 my $matched; while (<>) { if (m/^(?<LEVEL>\S+) \s+ (?<TIMESTAMP>(\d+) \s+ ([\d.])+) \s+ (?<THREADCLASS>\S+) \s+ - \s+ (?<REST>.*)/x) { my %captures = %+; $matched = ($+{REST} =~ $regexp); if ($matched) { print "LEVEL: $captures{LEVEL}\n"; ... } } elsif ($matched) { print; } } 

The following is a general method for analyzing multi-line blocks. The following loop reads STDIN one line at a time and feeds the full blocks of the log file to the process routine:

 my $first; my $stack = ""; while (<STDIN>) { if (m/^\S /) { process($first, $stack) if $first; $first = $_; $stack = ""; } else { $stack .= $_; } } process($first, $stack) if $first; sub process { my ($first, $stack) = @_; # ... do whatever you want here ... } 
+1
source

The problem is the misuse of [] in your regular expression.

[...] intended to define character classes

(...) for grouping

All you need to do is change [E|W|D|I] to [EWDI] everywhere and not use [] to group in MESSAGE .

Here is the final code that works for me:

 #!/usr/bin/perl use strict; use warnings; undef $/; while (<>) { while ( $_ =~ /(?<LEVEL>^[EWDIS])\s (?<TIMESTAMP>\d{6}\s\d{6}\.\d{3})\s (?<THREAD>.*?)\/ (?<CLASS>.*?)\s-\s (?<MESSAGE>.*?[\r\n](?=[EWDIS]\s\d{6}\s\d{6}\.\d{3}|$))/gmxs ) { print_result(); } } sub print_result { print "LEVEL: $+{LEVEL}\n"; print "TIMESTAMP: $+{TIMESTAMP}\n"; print "THREAD: $+{THREAD}\n"; print "CLASS: $+{CLASS}\n"; print "MESSAGE: $+{MESSAGE}\n"; } 

Please note that you omitted the letter "S" in the flag list.

This example may also contain errors, but it works in general.

0
source

All Articles