I have two related questions. Firstly, what is the best way to check logs that have a βrandomβ interval, etc., and the second one I will ask separately is how to handle logs that have arbitrary attribute-value pairs. (See: logstash grok filter for logs with arbitrary value attribute pairs )
So, for the first question, I have a log line that looks like this:
14:46:16.603 [http-nio-8080-exec-4] INFO METERING - msg=93e6dd5e-c009-46b3-b9eb-f753ee3b889a CREATE_JOB job=a820018e-7ad7-481a-97b0-bd705c3280ad data=71b1652e-16c8-4b33-9a57-f5fcb3d5de92
Using http://grokdebug.herokuapp.com/ I ended up creating the following grok template that works for this line:
%{TIME:timestamp} %{NOTSPACE:http} %{WORD:loglevel}%{SPACE}%{WORD:logtype} - msg=%{NOTSPACE:msg}%{SPACE}%{WORD:action}%{SPACE}job=%{NOTSPACE:job}%{SPACE}data=%{NOTSPACE:data}
With the following configuration file:
input { file { path => "/home/robyn/testlogs/trimmed_logs.txt" start_position => beginning sincedb_path => "/dev/null" # for testing; allows reparsing } } filter { grok { match => {"message" => "%{TIME:timestamp} %{NOTSPACE:http} %{WORD:loglevel}%{SPACE}%{WORD:logtype} - msg=%{NOTSPACE:msg}%{SPACE}%{WORD:action}%{SPACE}job=%{NOTSPACE:job}%{SPACE}data=%{NOTSPACE:data}" } } } output { file { path => "/home/robyn/filteredlogs/trimmed_logs.out.txt" } }
I get the following output:
{"message":"14:46:16.603 [http-nio-8080-exec-4] INFO METERING - msg=93e6dd5e-c009-46b3-b9eb-f753ee3b889a CREATE_JOB job=a820018e-7ad7-481a-97b0-bd705c3280ad data=71b1652e-16c8-4b33-9a57-f5fcb3d5de92","@version":"1","@timestamp":"2015-08-07 T17:55:16.529Z","host":"hlt-dev","path":"/home/robyn/testlogs/trimmed_logs.txt","timestamp":"14:46:16.603","http":"[http-nio-8080-exec-4]","loglevel":"INFO","logtype":"METERING","msg":"93e6dd5e-c009-46b3-b9eb-f753ee3b889a","action":"CREATE_JOB","job":"a820018e-7ad7-481a-97b0-bd705c3280ad","data":"71b1652e-16c8-4b33-9a57-f5fcb3d5de92"}
This is pretty much what I want, but I feel it is a really confusing template, especially with the need to use% {SPACE} and% {NOSPACE}. This tells me that I am not doing it in the best way. Should I create a more specific template for hex identifiers? I think I need% {SPACE} between loglevel and logtype due to the extra space between INFO and METERING in the log, but this also feels kludgy.
Also, how do I get the log timestamp for replacing @timestamp, which seems to be a log time table that we don't need / need.
Obviously, I'm starting to work with ELK and grok, so pointers to useful resources are also recommended.