Fuzzy c-value tcp dump clustering in matlab

Hi, I have some data that is presented as follows:

0,tcp,http,SF,239,486,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,8,8,0.00,0.00,0.00,0.00,1.00,0.00,0.00,19,19,1.00,0.00,0.05,0.00,0.00,0.00,0.00,0.00,normal. 

Its from the 1999 kdd cup, which was based on the darpa set.

the text file that I have has lines and lines of data like this, in matlab there is a general clustering tool that you can use by typing findcluster, but it only accepts .dat files.

Im also not very sure if it will accept a format like this. Im also not sure why there are so many trailing zeros in the dump files.

Can anyone help how I can use a text document and run it through the fcm clustering method in Matlab? Code help is really necessary.

+1
matlab machine-learning cluster-analysis data-mining
source share
1 answer

FINDCLUSTER is just a GUI for two clustering algorithms: FCM and SUBCLUST

First you need to read the data from the file, look at TEXTSCAN for this.

Then you need to deal with non-numeric attributes; either remove them, or convert them in any way. As far as I can tell, in two algorithms only supporting numerical data is mentioned.

Visit the original KDD cup kit site for a description of each attribute.

+2
source share

All Articles