Using a neural network for use with log file data

I took the Andrew NGs coursera AI course, in particular the section on neural networks and Im, planning to embed a neural network in the data of a log file.

My log file contains data of this type:

<IP OF MACHINE INITIATING REQUEST><DATE OF REQUEST><TIME OF REQUEST><NAME OF RESOUCE BEING ACCESSED ON SERVER><RESPONSE CODE><TIME TAKEN FOR SERVER TO SERVE PAGE> 

I know that there are other classification algorithms that can be used for this task, such as naΓ―ve bayes and local outlier factor , but want to access neural networks using a real applicable problem.

I read about self-organizing neural network maps, and this seems to be more suitable for this type of problem, since the log file does not have any structure, but seems to be a more complex topic.

Instead of using a self-organizing neural network map, I plan to create training data from the log file data, grouping the data into a pair of key values, where the key is <IP OF MACHINE INITIATING REQUEST> , and the value for each key is [<NAME OF RESOUCE BEING ACCESSED ON SERVER>, ><TIME TAKEN FOR SERVER TO SERVE PAGE>]

From the above Im log file data aimed at using a neural network (s):

 To classify similar IP behaviors based on what resources are being accessed. Classify behavior at specific periods / moments in time, so what IP's are behaving similarly and specific moment in time. 

I'm not sure where to start from above. Ive implemented very simple neural networks that perform integer arithmetic, but now they want to implement the network in use based on the data that I have.

Based on the log data format, is this a good use case?

Any pointers on where to be with this task?

I hope that this question is not too general, I just do not know what issues should be considered when starting the implementation of a neural network.

Update:

I would like to output the data that is best suited for creating a neural network.

For this, I consider the conclusion of the user classification based on time periods based on similarity assessment.

To generate a similarity score, I could generate the number of times each IP address accesses a resource:

eg:

 1.2.3.A,4,3,1 1.2.3.B,0,1,2 1.2.3.C,3,7,3 

from this generate:

 <HOUR OF DAY>,<IP ADDRESS X>,<IP ADDRESS Y>,<SIMMILARITY SCORE> 

:

 1,1.2.3.A,1.2.3.B,.3 1,1.2.3.C,1.2.3.B,.2 1,1.2.3.B,1.2.3.B,0 2,1.2.3.D,1.2.3.B,.764 2,1.2.3.E,1.2.3.B,.332 3,1.2.3.F,1.2.3.B,.631 

So then you can start to correlate with how users behave during the day.

Applies to a neural network?

I understand that I am asking about a neural network that is looking for a problem, but is this a suitable problem?

+6
source share
1 answer

Based on the log data format, is this a good use case?

You can use it as a dataset to train the neural network for the future values ​​of predict or classify them in labels (or categories). For some types of neural networks (especially Multi-Layer Perceptron ) it depends on how you organize your dataset for use during neural network training. In other cases, you can group a pattern (also known as clustering ).

Neural network concept

Since you have historical data separated in fields (or properties), you can create a model from neural network - classify or predict possible future values.

Since a neural network is a mathematical model that is determined by the steps of learning, you must define the input and output sets that will be used during training to determine this model (neural network). Given this, your qualitative values ​​(texts, symbols, letters, etc.) must be converted to quantitative values, for example:

 A you convert to 1 B you convert to 2 C you convert to 3 ... Z you convert to N 

After that, you can arrange your data set in samples to divide it into an input list and an ideal output for each sample. For example, suppose you have a dataset that defines homes in the real estate market and their prices. You have a task to determine the price (offer) for new future homes, an example of your training set may be this:

Entrance:

 Bedrooms ; Bathrooms ; Garage ; Near Subway 1 ; 1 ; 0 ; 1 3 ; 2 ; 2 ; 1 2 ; 2 ; 1 ; 0 

Perfect result (for each sample input, respectively)

 Price 100.000 150.000 230.000 

And use these neural network training kits to bid for a future home with features

Your problem

In your case, IPs fields can be converted to quantitative values. For sample:

 1.2.3 convert to 1 1.2.4 convert to 2 1.2.5 convert to 3 

Suppose you want to classify a SIMILARITY SCORE field, so you can use the HOUR OF DAY , IP ADDRESS X and IP ADDRESS Y columns as an input set and an output set that you have only SIMILARITY SCORE . The image below shows how to control it (a simple neural network with direct connection).

enter image description here

There are many tools that allow you to easily work with neural networks, you can use arrays of double values ​​to define these sets, and the object will be trained for you. I used the Encog Framework from Heaton Research, and it supports Java, C #, C ++ and others. There is also another, called the Accord Framework , but it is for .Net only.

An example of how to implement Feed-forward Neural Network using Encog for Java:

 BasicNetwork network = new BasicNetwork(); // add layers in the neural network network.addLayer(new BasicLayer(null, true, 3)); network.addLayer(new BasicLayer(new ActivationTANH(), true, 4)); network.addLayer(new BasicLayer(new ActivationTANH(), true, 1)); // finalize and randomize the neural network network.getStructure().finalizeStructure(); network.reset(); // define a random training set. // You can define using your double arrays here MLDataSet training = RandomTrainingFactory.generate(1000, 5, network.getInputCount(), network.getOutputCount(), -1, 1); ResilientPropagation train = new ResilientPropagation(network, training); double error = 0; Integer epochs = 0; //starting training do { //train train.iteration(); //count how many iterations the loop has epochs++; // get the error of neural network in the training set error = train.getError(); // condition for stop training } while (epochs < 1000 && error > 0.01); 

Obs: I have not tested this code.

If you start with neural networks, I recommend that you implement your model and try it out using datasets from the UCI Machine Learning Repository . There are too many datasets for classification, regression, and clustering that you can test in your implementation.

+8
source

All Articles