So, I have two classification methods: discriminant analysis of diagnostic classification (naive tales) and the pure Naive Bayes classifier implemented in Matlab, there are 23 classes in the entire data set. The first discriminant analysis method:
%% Classify Clusters using Naive Bayes Classifier and classify training_data = Testdata; target_class = TestDataLabels; [class, err] = classify(UnseenTestdata, training_data, target_class,'diaglinear') cmat1 = confusionmat(UnseenTestDataLabels, class); acc1 = 100*sum(diag(cmat1))./sum(cmat1(:)); fprintf('Classifier1:\naccuracy = %.2f%%\n', acc1); fprintf('Confusion Matrix:\n'), disp(cmat1)
Obtains accuracy from the confusion matrix of 81.49% with an error rate ( err ) of 0.5040 (not sure how to interpret this).
Second Naive Bayes Classifier Method:
%% Classify Clusters using Naive Bayes Classifier training_data = Testdata; target_class = TestDataLabels; %# train model nb = NaiveBayes.fit(training_data, target_class, 'Distribution', 'mn'); %# prediction class1 = nb.predict(UnseenTestdata); %# performance cmat1 = confusionmat(UnseenTestDataLabels, class1); acc1 = 100*sum(diag(cmat1))./sum(cmat1(:)); fprintf('Classifier1:\naccuracy = %.2f%%\n', acc1); fprintf('Confusion Matrix:\n'), disp(cmat1)
Reaches an accuracy of 81.89% .
I checked only one round of cross-validation, im new in matlab and controlled / uncontrolled algorithms, so I did the cross-validation myself, I just basically take 10% of the data and set it aside for testing purposes, as it is a random set every time I could go through it several times and accept average accuracy, but the results will be used for explanation.
So, to my problematic question.
In my literature, a review of existing methods, many researchers find that a single classification algorithm, mixed with clustered alogorite, gives better accuracy results. They do this by picking the optimal number of clusters for their data and using split clusters (which should be more similar to each other) start each individual cluster using a classification algorithm. A process in which you can use the best parts of an uncontrolled algorithm in combination with a controlled classification algorithm.
Now I use a dataset that has been used repeatedly in the literature, and I'm trying to not easily compare with each other in my searches.
First I use the simple K-Means clustering, which unexpectedly has a good ability to copy my data. The result looks like this:

Looking at the cluster class labels (K1, K2 ... K12):
%% output the class labels of each cluster K1 = UnseenTestDataLabels(indX(clustIDX==1),:)
I find that in each cluster one class prevails in 9 clusters, and 3 clusters contain several class labels. Showing that the K-tool is well suited for data.
However the problem is that I have cluster data (cluster1, cluster2 ... cluster12):
%% output the real data of each cluster cluster1 = UnseenTestdata(clustIDX==1,:)
And I put each cluster through naive tales or discriminant analysis as follows:
class1 = classify(cluster1, training_data, target_class, 'diaglinear'); class2 = classify(cluster2, training_data, target_class, 'diaglinear'); class3 = classify(cluster3, training_data, target_class, 'diaglinear'); class4 = classify(cluster4, training_data, target_class, 'diaglinear'); class5 = classify(cluster5, training_data, target_class, 'diaglinear'); class6 = classify(cluster6, training_data, target_class, 'diaglinear'); class7 = classify(cluster7, training_data, target_class, 'diaglinear'); class8 = classify(cluster8, training_data, target_class, 'diaglinear'); class9 = classify(cluster9, training_data, target_class, 'diaglinear'); class10 = classify(cluster10, training_data, target_class, 'diaglinear'); class11 = classify(cluster11, training_data, target_class, 'diaglinear'); class12 = classify(cluster12, training_data, target_class, 'diaglinear');
The accuracy becomes terrifying, 50% of the clusters are classified with an accuracy of 0%, each classified cluster (acc1, acc2, ... acc12) has its own mixing matrix, you can see the accuracy of each cluster here:

So, my problem / question is where am I mistaken, I thought at first, maybe I have data / labels mixed for clusters, but what I posted above looks right, I can not see the problem with it.
Why do the data represent the same invisible 10% of the data used in the first experiment, which gives such strange results for the same invisible clustered data? I mean, it should be noted that NB is a stable classifier and should not be easy to type and see, since the training data is extensive, while the clusters to be classified are parallel intercepts, should not happen?
EDIT:
According to the comments, I included the cmat file for the first testing example, which gives 81.49% accuracy and 0.5040 error:

A K fragment was also requested, the class and the cmat associated with it in this example (cluster4), accuracy 3.03% :

Having seen that there were a large number of classes (23 in total), I decided to reduce the classes, as indicated in the 1999 KDD Cup, this is simply an application of abit domain knowledge, since some of the attacks are more similar to each other and fall under the same umbrella term.
Then I trained a classifier with 444 thousand records, holding 10% for testing purposes.
Accuracy was worse than 73.39% , error rate was also worse than 0.4261

Inconsistency is broken down into its classes:
DoS: 39149 Probe: 405 R2L: 121 U2R: 6 normal.: 9721
Class or classified labels (result of discriminant analysis)
DoS: 28135 Probe: 10776 R2L: 1102 U2R: 1140 normal.: 8249
Training data consists of:
DoS: 352452 Probe: 3717 R2L: 1006 U2R: 49 normal.: 87395
I'm afraid if I lower the training data to have a similar perception of malicious activity, then the classifier will not have enough predictive ability to distinguish classes, however, looking at some other literature, I noticed that some researchers delete U2R, because there is not enough data for successful classification.
The methods I have tried so far are one class of classifiers, where I train the classifier to only predict one class (inefficient), classify individual clusters (even worse accuracy), decreasing class labels (2nd place) and keeping full 23 (best accuracy).