I took the Andrew NGs coursera AI course, in particular the section on neural networks and Im, planning to embed a neural network in the data of a log file.
My log file contains data of this type:
<IP OF MACHINE INITIATING REQUEST><DATE OF REQUEST><TIME OF REQUEST><NAME OF RESOUCE BEING ACCESSED ON SERVER><RESPONSE CODE><TIME TAKEN FOR SERVER TO SERVE PAGE>
I know that there are other classification algorithms that can be used for this task, such as naΓ―ve bayes and local outlier factor , but want to access neural networks using a real applicable problem.
I read about self-organizing neural network maps, and this seems to be more suitable for this type of problem, since the log file does not have any structure, but seems to be a more complex topic.
Instead of using a self-organizing neural network map, I plan to create training data from the log file data, grouping the data into a pair of key values, where the key is <IP OF MACHINE INITIATING REQUEST> , and the value for each key is [<NAME OF RESOUCE BEING ACCESSED ON SERVER>, ><TIME TAKEN FOR SERVER TO SERVE PAGE>]
From the above Im log file data aimed at using a neural network (s):
To classify similar IP behaviors based on what resources are being accessed. Classify behavior at specific periods / moments in time, so what IP's are behaving similarly and specific moment in time.
I'm not sure where to start from above. Ive implemented very simple neural networks that perform integer arithmetic, but now they want to implement the network in use based on the data that I have.
Based on the log data format, is this a good use case?
Any pointers on where to be with this task?
I hope that this question is not too general, I just do not know what issues should be considered when starting the implementation of a neural network.
Update:
I would like to output the data that is best suited for creating a neural network.
For this, I consider the conclusion of the user classification based on time periods based on similarity assessment.
To generate a similarity score, I could generate the number of times each IP address accesses a resource:
eg:
1.2.3.A,4,3,1 1.2.3.B,0,1,2 1.2.3.C,3,7,3
from this generate:
<HOUR OF DAY>,<IP ADDRESS X>,<IP ADDRESS Y>,<SIMMILARITY SCORE>
:
1,1.2.3.A,1.2.3.B,.3 1,1.2.3.C,1.2.3.B,.2 1,1.2.3.B,1.2.3.B,0 2,1.2.3.D,1.2.3.B,.764 2,1.2.3.E,1.2.3.B,.332 3,1.2.3.F,1.2.3.B,.631
So then you can start to correlate with how users behave during the day.
Applies to a neural network?
I understand that I am asking about a neural network that is looking for a problem, but is this a suitable problem?