I'm not new to data mining, so I completely lost WEKA results. Hope for some help. Thanks in advance!
I have a dataset of number vectors that have a binary classification (S, H). I am training the NaiveBayes model (although the method really does not matter), leaving one of cross-validation. The following are the results:
=== Predictions on test data ===
inst
1 1:H 1:H *1,0
1 1:H 1:H *1,0
1 1:H 1:H *1,0
1 1:H 1:H *1,0
1 1:H 1:H *1,0
1 1:H 1:H *1,0
1 1:H 1:H *1,0
1 1:H 1:H *1,0
1 1:H 2:S + 0,*1
1 1:H 1:H *1,0
1 1:H 1:H *1,0
1 1:H 1:H *1,0
1 1:H 1:H *1,0
1 1:H 1:H *1,0
1 1:H 1:H *0.997,0.003
1 2:S 2:S 0,*1
1 2:S 2:S 0,*1
1 2:S 2:S 0,*1
1 2:S 2:S 0,*1
1 2:S 2:S 0,*1
1 2:S 2:S 0,*1
1 2:S 2:S 0,*1
1 2:S 2:S 0,*1
1 2:S 2:S 0,*1
1 2:S 2:S 0,*1
1 2:S 2:S 0,*1
1 2:S 2:S 0,*1
1 2:S 2:S 0,*1
1 2:S 2:S 0,*1
1 2:S 2:S 0,*1
1 2:S 2:S 0,*1
1 2:S 2:S 0,*1
1 2:S 2:S 0,*1
1 2:S 2:S 0,*1
1 2:S 2:S 0,*1
1 2:S 2:S 0,*1
1 2:S 2:S 0,*1
1 2:S 2:S 0,*1
1 2:S 2:S 0,*1
1 2:S 2:S 0,*1
1 2:S 2:S 0,*1
1 2:S 2:S 0,*1
1 2:S 2:S 0,*1
1 2:S 2:S 0,*1
1 2:S 2:S 0,*1
1 2:S 2:S 0,*1
1 2:S 2:S 0,*1
1 2:S 2:S 0,*1
1 2:S 2:S 0,*1
1 2:S 1:H + *1,0
1 2:S 2:S 0,*1
1 2:S 2:S 0,*1
1 2:S 2:S 0,*1
1 2:S 2:S 0,*1
1 2:S 2:S 0,*1
1 2:S 2:S 0,*1
1 2:S 2:S 0,*1
1 2:S 2:S 0,*1
1 2:S 2:S 0,*1
1 2:S 2:S 0,*1
1 2:S 2:S 0,*1
1 2:S 2:S 0,*1
1 2:S 2:S 0,*1
1 2:S 2:S 0,*1
1 2:S 2:S 0,*1
1 2:S 1:H + *1,0
=== Stratified cross-validation ===
=== Summary ===
Total Number of Instances 66
=== Confusion Matrix ===
a b <-- classified as
14 1 | a = H
2 49 | b = S
As you can see, there are three errors in both the output and the confusion matrix. Then I re-evaluate the model using an independent dataset with the same attributes and the same two classes. Here is the result:
=== Re-evaluation on test set ===
User supplied test set
Relation: FCBC_New.TagProt
Instances: unknown (yet). Reading incrementally
Attributes: 355
=== Predictions on user test set ===
inst
1 1:S 2:H + 0,*1
2 1:S 1:S *1,0
3 1:S 2:H + 0,*1
4 2:H 1:S + *1,0
5 2:H 2:H 0,*1
6 1:S 2:H + 0,*1
7 1:S 2:H + 0,*1
8 2:H 2:H 0,*1
9 1:S 1:S *1,0
10 1:S 2:H + 0,*1
11 1:S 2:H + 0,*1
12 2:H 1:S + *1,0
13 2:H 2:H 0,*1
14 1:S 2:H + 0,*1
15 1:S 2:H + 0,*1
16 1:S 2:H + 0,*1
17 2:H 2:H 0,*1
18 2:H 2:H 0,*1
19 1:S 2:H + 0,*1
20 1:S 2:H + 0,*1
21 1:S 2:H + 0,*1
22 1:S 1:S *1,0
23 1:S 2:H + 0,*1
24 1:S 2:H + 0,*1
25 2:H 1:S + *1,0
26 1:S 2:H + 0,*1
27 1:S 1:S *1,0
28 1:S 2:H + 0,*1
29 1:S 2:H + 0,*1
30 1:S 2:H + 0,*1
31 1:S 2:H + 0,*1
32 1:S 2:H + 0,*1
33 1:S 2:H + 0,*1
34 1:S 1:S *1,0
35 2:H 1:S + *1,0
36 1:S 2:H + 0,*1
37 1:S 1:S *1,0
38 1:S 1:S *1,0
39 2:H 1:S + *1,0
40 1:S 2:H + 0,*1
41 1:S 2:H + 0,*1
42 1:S 2:H + 0,*1
43 1:S 2:H + 0,*1
44 1:S 2:H + 0,*1
45 1:S 2:H + 0,*1
46 1:S 2:H + 0,*1
47 2:H 1:S + *1,0
48 1:S 2:H + 0,*1
49 2:H 1:S + *1,0
50 2:H 1:S + *1,0
51 1:S 2:H + 0,*1
52 1:S 2:H + 0,*1
53 2:H 1:S + *1,0
54 1:S 2:H + 0,*1
55 1:S 2:H + 0,*1
56 1:S 2:H + 0,*1
=== Summary ===
Correctly Classified Instances 44 78.5714 %
Incorrectly Classified Instances 12 21.4286 %
Kappa statistic 0.4545
Mean absolute error 0.2143
Root mean squared error 0.4629
Coverage of cases (0.95 level) 78.5714 %
Total Number of Instances 56
=== Detailed Accuracy By Class ===
TP Rate FP Rate Precision Recall F-Measure MCC ROC Area PRC Area Class
0.643 0.167 0.563 0.643 0.600 0.456 0.828 0.566 H
0.833 0.357 0.875 0.833 0.854 0.456 0.804 0.891 S
Weighted Avg. 0.786 0.310 0.797 0.786 0.790 0.456 0.810 0.810
=== Confusion Matrix ===
a b <-- classified as
9 5 | a = H
7 35 | b = S
. , . 44. , , 12 . , , . , , , H, - S ( 1,0 H). , 1,0 S-. , , . (H S), .
H S. : 16 (H) 40 b (S), 16 b (S) 40 a (H). , ? , ...