FCM Numeric Data Clustering and csv / excel file

Hi, I asked the previous question, which gave a reasonable answer, and I thought that I would return, Fuzzy c-means tpp dump cluster in Matlab , the problem is that the preprocessing step is below the tcp / udp data that I would like to run through clustering algorithm mlabs fcm. My question is:

1) how am I or what would be the best method for converting text data into cells into a numerical value? What should be the numerical value?

Edit: My data in Excel looks like this:

enter image description here

0,tcp,http,SF,239,486,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,8,8,0.00,0.00,0.00,0.00,1.00,0.00,0.00,19,19,1.00,0.00,0.05,0.00,0.00,0.00,0.00,0.00,normal. 
+1
excel matlab cluster-analysis data-mining
source share
1 answer

Here is an example of how I will read data in MATLAB. You need two things: the data itself, which is in a format separated by commas, as well as a list of functions along with their types (numeric, nominal).

 %# read the list of features fid = fopen('kddcup.names','rt'); C = textscan(fid, '%s %s', 'Delimiter',':', 'HeaderLines',1); fclose(fid); %# determine type of features C{2} = regexprep(C{2}, '.$',''); %# remove "." at the end attribNom = [ismember(C{2},'symbolic');true]; %# nominal features %# build format string used to read/parse the actual data frmt = cell(1,numel(C{1})); frmt( ismember(C{2},'continuous') ) = {'%f'}; %# numeric features: read as number frmt( ismember(C{2},'symbolic') ) = {'%s'}; %# nominal features: read as string frmt = [frmt{:}]; frmt = [frmt '%s']; %# add the class attribute %# read dataset fid = fopen('kddcup.data','rt'); C = textscan(fid, frmt, 'Delimiter',','); fclose(fid); %# convert nominal attributes to numeric ind = find(attribNom); G = cell(numel(ind),1); for i=1:numel(ind) [C{ind(i)},G{i}] = grp2idx( C{ind(i)} ); end %# all numeric dataset M = cell2mat(C); 

You can also learn the DATASET class from the statistics toolbar.

+2
source share

All Articles