Classification WEKA classes

Question

Classification WEKA classes

I would like to know if there is a way in WEKA to derive a number of "best guesses" for classification.

My scenario: I classify the data with cross-validation, for example, then on the weka output I get something like: these are the 3 best prerequisites for classifying this instance. I want that, even if the instance was not correctly classified, I get a conclusion from 3 or 5 best guesses for this instance.

Example:

Classes: A, B, C, D, E Instances: 1 ... 10

And the conclusion will be: instance 1 is 90% likely to be class A, 75% is likely to be class B, 60% is like class C.

Thanks.

+6

machine-learning weka

user1454263 Aug 14 '12 at 20:54

source share

4 answers

stackoverflowuser2010 · Answer 1 · 2012-08-25T16:11:15+0000

The Weka API has a method called Classifier.distributionForInstance (), which you can use to get the classification prediction distribution. You can then sort the distribution, reducing the chance of getting top-level forecasts.

The following is a function that prints: (1) the ground truth label of the test instance; (2) the predicted shortcut from classifyInstance (); and (3) the distribution of the forecast from the distribution ForInstance (). I used this with J48, but it should work with other classifiers.

Input parameters are a serialized model file (which you can create at the stage of model preparation and applying the -d option) and a test file in the ARFF format.

public void test(String modelFileSerialized, String testFileARFF) throws Exception { // Deserialize the classifier. Classifier classifier = (Classifier) weka.core.SerializationHelper.read( modelFileSerialized); // Load the test instances. Instances testInstances = DataSource.read(testFileARFF); // Mark the last attribute in each instance as the true class. testInstances.setClassIndex(testInstances.numAttributes()-1); int numTestInstances = testInstances.numInstances(); System.out.printf("There are %d test instances\n", numTestInstances); // Loop over each test instance. for (int i = 0; i < numTestInstances; i++) { // Get the true class label from the instance own classIndex. String trueClassLabel = testInstances.instance(i).toString(testInstances.classIndex()); // Make the prediction here. double predictionIndex = classifier.classifyInstance(testInstances.instance(i)); // Get the predicted class label from the predictionIndex. String predictedClassLabel = testInstances.classAttribute().value((int) predictionIndex); // Get the prediction probability distribution. double[] predictionDistribution = classifier.distributionForInstance(testInstances.instance(i)); // Print out the true label, predicted label, and the distribution. System.out.printf("%5d: true=%-10s, predicted=%-10s, distribution=", i, trueClassLabel, predictedClassLabel); // Loop over all the prediction labels in the distribution. for (int predictionDistributionIndex = 0; predictionDistributionIndex < predictionDistribution.length; predictionDistributionIndex++) { // Get this distribution index class label. String predictionDistributionIndexAsClassLabel = testInstances.classAttribute().value( predictionDistributionIndex); // Get the probability. double predictionProbability = predictionDistribution[predictionDistributionIndex]; System.out.printf("[%10s : %6.3f]", predictionDistributionIndexAsClassLabel, predictionProbability ); } o.printf("\n"); } }

Antimony · Answer 2 · 2012-08-14T20:57:19+0000

I don’t know if you can do this initially, but you can just get the probabilities for each class, sort them and take the first three.

Required function distributionForInstance(Instance instance) , which returns double[] , giving the probability for each class.

Lars kotthoff · Answer 3 · 2012-08-14T21:00:13+0000

Not in general. The information you want is not available for all classifiers - in most cases (for example, decision trees) the solution is clear (although potentially incorrect) without a reliable value. Your task requires classifiers that can handle uncertainty (for example, the naive Bayes classifier).

Technically, the easiest thing to do is probably train the model and then classify a single instance, for which Weka should provide you with the desired result. In general, you can, of course, also do this for instance sets, but I don't think Weka provides this out of the box. You may have to tweak the code or use it through an API (for example, in R).

redrubia · Answer 4 · 2012-08-20T21:45:01+0000

when you calculate the probability for an instance, how exactly do you do it?

I have posted my PART rules and data for the new instance here , but since the calculation is manual, I'm not sure how to do it! Thanks

EDIT: now calculated:

private float [] getProbDist (line break) {

// accepts something like (52/2), which means that 52 instances are correctly classified and 2 are incorrectly classified.

  if(prob_dis.length > 2) return null; if(prob_dis.length == 1){ String temp = prob_dis[0]; prob_dis = new String[2]; prob_dis[0] = "1"; prob_dis[1] = temp; } float p1 = new Float(prob_dis[0]); float p2 = new Float(prob_dis[1]); // assumes two tags float[] tag_prob = new float[2]; tag_prob[1] = 1 - tag_prob[1]; tag_prob[0] = (float)p2/p1; // returns double[] as being the probabilities return tag_prob; }

Classification WEKA classes

More articles: