I need grep fulltattrace from a log file for a keyword.
This code works fine, but slows down on large files (larger than the file slower). I think the best way to improve the regular expression for searching a keyword, but I could not do it.
#!/usr/bin/perl use strict; use warnings; my $regexp; my $stacktrace; undef $/; $regexp = shift; $regexp = quotemeta($regexp); while (<>) { while ( $_ =~ /(?<LEVEL>^[E|W|D|I])\s (?<TIMESTAMP>\d{6}\s\d{6}\.\d{3})\s (?<THREAD>.*?)\/ (?<CLASS>.*?)\s-\s (?<MESSAGE>.*?[\r|\n](?=^[[E|W|D|I]\s\d{6}\s\d{6}\.\d{3}]?))/gsmx ) { $stacktrace = $&; if ( $+{MESSAGE} =~ /$regexp/ ) { print "$stacktrace"; } } }
Usage: ./grep_log4j.pl <pattern> <file>
Example: ./grep_log4j.pl Exception sample.log
I think the problem is $stacktrace = $&;
, because if you delete this line and just print all the relevant lines, the script works quickly. Script version to print all matches:
#!/usr/bin/perl use strict; use warnings; undef $/; while (<>) { while ( $_ =~ /(?<LEVEL>^[E|W|D|I])\s (?<TIMESTAMP>\d{6}\s\d{6}\.\d{3})\s (?<THREAD>.*?)\/ (?<CLASS>.*?)\s-\s (?<MESSAGE>.*?[\r|\n](?=^[[E|W|D|I]\s\d{6}\s\d{6}\.\d{3}]?))/gsmx ) { print_result(); } } sub print_result { print "LEVEL: $+{LEVEL}\n"; print "TIMESTAMP: $+{TIMESTAMP}\n"; print "THREAD: $+{THREAD}\n"; print "CLASS: $+{CLASS}\n"; print "MESSAGE: $+{MESSAGE}\n"; }
Usage: ./grep_log4j.pl <file>
Example: ./grep_log4j.pl sample.log
Figure Lo4j: %-1p %d %t/%c{1} - %m%n
Example log file:
I 111012 141506.000 thread/class - Received message: something E 111012 141606.000 thread/class - Failed handling mobile request java.lang.NullPointerException at javax.servlet.http.HttpServlet.service(HttpServlet.java:710) at java.lang.Thread.run(Thread.java:619) W 111012 141706.000 thread/class - Received message: something E 111012 141806.000 thread/class - Failed with Exception java.lang.NullPointerException at javax.servlet.http.HttpServlet.service(HttpServlet.java:710) at java.lang.Thread.run(Thread.java:619) D 111012 141906.000 thread/class - Received message: something S 111012 142006.000 thread/class - Received message: something I 111012 142106.000 thread/class - Received message: something I 111013 142206.000 thread/class - Metrics:0/1
You can find my regex at http://gskinner.com/RegExr/ with the log4j keyword:
source share