Indexing and displaying log data with solr 6

I am currently using solr 6 and I want to index log data as shown below:

2016-06-22T03: 00: 04Z | INFO | f-10-11-0-241 | 1301 | DreamRocket.Game.ServiceInterface.GameCredentialsAuthProvider | DreamRocket.Game.ServiceInterface.GameCredentialsAuthProvider.CheckValidGameDataRequestFilter | Invalid UserAgent =% E3% 83% 94% E3% 82% B3 / 1.07.41149 CFNetwork / 758.2.8 Darwin / 15.0.0, PlayerId = player_a2a7d1a4-0a31-4c4d-b5bf-10be67dc85d6 |

I am not sure how to separate pipe data. The layout I use in Nlog is this.

${date:universalTime=True:format=yyyy-MM-ddTHH\:mm\:ssZ}|${level:uppercase=true}|${machinename}|${processid}|${logger}|${callsite:className=true:methodName=true}|${message}|${exception:format=tostring}${newline} 

And I tried to use CSV loading, but solr gives me below json return. No queries required. Please, help

  "responseHeader":{ "status":0, "QTime":77, "params":{ "q":"*:*", "indent":"on", "wt":"json", "_":"1466745065000"}}, "response":{"numFound":8,"start":0,"docs":[ { "id":"b28049bb-d49e-4b4d-80db-d7d77351527b", "2016-06-23T02_37_18Z_INFO_web.chubi.development1_6326_DreamRocket.Game.ServiceInterface.GameCredentialsAuthProvider_DreamRocket.Game.ServiceInterface.GameCredentialsAuthProvider.CheckValidGameDataRequestFilter_Invalid_UserAgent_PIKO_0.00.41269_CFNetwork_711.5.6_Darwin_14.0.0":["2016-06-23T02:37:28Z|INFO|web.chubi.development1|6326|DreamRocket.Game.ServiceInterface.GameCredentialsAuthProvider|DreamRocket.Game.ServiceInterface.GameCredentialsAuthProvider.CheckValidGameDataRequestFilter|Invalid UserAgent=PIKO/0.00.41269 CFNetwork/711.5.6 Darwin/14.0.0"], "_PlayerId_player_407defcf-7032-4ef4-81a6-91bb62b9150b_":[" PlayerId=player_905266b2-9ce3-4fa1-b0a7-4663b9509731|"], "_version_":1537919142165741568}]} 
+7
solr6
source share
1 answer

It looks like you want to extract Pure data from magazines that can be indexed and searched without any ambiguity. Why not try analyzing your data by creating a custom Analyzer that uses Regex to filter data for you. I highly recommend solr.PatternTokenizerFactory remove the channel character from your text. In addition, you can use the Analysis tab in solr to comprehensively analyze how your log data was processed by Analyzer. For encoded text, for example, in the "Invalid user" field, you can use the ASCII Folding filter factory to index encoded characters. And you may also need to tokenize the data in points, I do not know if this is your requirement or not. PatternTokenizer does the trick on your data, and if you still need to do further refinements, you can use solr.WordDelimeter to better tune your index. Perhaps I will edit this solution with some analyzer settings for you :)

+2
source share

All Articles